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Title 

Recombinant Mcgalomicin Biosynthetic Genes And Uses Thereof 

Cross-Reference to Priority Application 
This application claims priority to provisional U.S. patent application 
Serial No. 60/158,305, filed 8 October 1999, and provisional U.S. patent 
application Serial No. 60/190,024, filed 17 March 2000 under 35 U.S.C. § 1 19(e). 
The content of the ab>ove referenced applications is incorporated herein by 
reference in its entirety. 

Field of the Invention 
The present invention provides recombinant methods and materials for 
producing polyketides by recombinant DNA technology. The invention relates to 
1 5 the fields of agriculture, animal husbandry, chemistry, medicinal chemistry, 
medicine, molecular biology, pharmacology, and veterinary technology. 



5 



10 



Background of the Invention 
Polyketides represent a large family of diverse compounds synthesized 

20 from 2-carbon units through a series of condensations and subsequent 

modifications. Polyketides occur in many types of organisms, including fungi and 
mycelial bacteria, in particular, the actinomycetes. There are a wide variety of 
polyketide structures, and the class of polyketides encompasses numerous 
compounds with diverse activities. Erythromycin, FK-506, FK-520, megalomicin, 

25 narbomycin, oleandomycin, picromycin, rapamycin, spinocyn, and tylosin are 
examples of such compounds. Given the difficulty in producing polyketide 
compounds by traditional chemical methodology, and the typically low production 
of polyketides in wild-type cells, there has been considerable interest in finding 
improved or alternate means to produce polyketide compounds. See PCT 

30 publication Nos. WO 93/1 3663; WO 95/08548; WO 96/40968; WO 97/02358; 
and WO 98/27203; United States Patent Nos. 4,874,748; 5,063,155; 5,098,837; 
5,149,639; 5,672,491; and 5,712,146; Fu ei al. 9 1994, Biochemistry J 3: 9321- 
9326; McDaniel ei al. 9 1993, Science 262: 1546-1550; and Rohr, \995, Angew. 
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Chem. Int. Ed Engl 5¥(8): 881-888, each of which is incorporated herein by 
reference. 

Polyketides are synthesized in nature by polyketide synthase (PKS) 
enzymes. These enzymes, which are complexes of multiple large proteins, are 
5 similar to the synthases that catalyze condensation of 2-carbon units in the 

biosynthesis of fatty acids. PKS enzymes are encoded by PKS genes that usually 
consist of three or more open reading frames (ORFs). Two major types of PKS 
enzymes are known; these differ in their composition and mode of synthesis. 
These two major types of PKS enzymes are commonly referred to as Type 1 or 

10 "modular" and Type II "iterative" PKS enzymes. 

Modular PKSs arc responsible for producing a large number of 12-, 14-, 
and 16-membered macrolide antibiotics including erythromycin, megalomicin, 
methymycin, narbomy cin. oleandomycin, picromycin, and tylosin. Each ORF of a 
modular PKS can comprise one, two, or more "modules" of ketosynthase activity, 

15 each module of which consists of at least two (if a loading module) and more 

typically three (for the simplest extender module) or more enzymatic activities or 
"domains." These large multifunctional enzymes (>300,000 kDa) catalyze the 
biosynthesis of polyketide macrolactones through multistep pathways involving 
decarboxylase condensations between acyl thioesters followed by cycles of 

20 varying B-carbon processing activities (see O'Hagan, D. The polyketide 

metabolites,, E. Horwood: New York, 1991, incorporated herein by reference). 

During the past half decade, the study of modular PKS function and 
specificity has been greatly facilitated by the plasmid-based Streptomyces 
coelicolor expression system developed with the 6-deoxyerythronolide B (6-dEB) 

25 synthase (DEBS) genes (sec Kao at aL, 1994, Science, 265: 509-5 12, McDaniel et 
al, \99\ Science 262: 1 546-1 557, and U.S. Patent Nos. 5,672,491 and 
5,712,146, each of which is incorporated herein by reference). The advantages to 
this plasmid-based genetic system for DEBS are that it overcomes the tedious and 
limited techniques for manipulating the natural DEBS host organism, 

30 Saccharopolyspora erythraea, allows more facile construction of recombinant 
PKSs, and reduces the complexity of PKS analysis by providing a "clean" host 
background. This system also expedited construction of the first combinatorial 

2 
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modular polyketide library in Streptomyces (see PCT publication No. WO 
98/49315, incorporated herein by reference). 

The ability to control aspects of polyketide biosynthesis, such as monomer 
selection and degree of G-carbon processing, by genetic manipulation of PKSs has 
5 stimulated great interest in the combinatorial engineering of novel antibiotics (see 
Hutchinson, 1998, Curr. Opin. Microbiol I: 319-329; Carreras and Santi, 1998, 
Curr. Opin. Biotech. 9: 403-41 1 ; and U.S. Patent Nos. 5,712,146 and 5,672,491 , 
each of which is incorporated herein by reference). This interest has resulted in the 
cloning, analysis, and manipulation by recombinant DNA technology of genes that 

1 0 encode PKS enzymes. The resulting technology allows one to manipulate a known 
PKS gene cluster either to produce the polyketide synthesized by that PKS at 
higher levels than occur in nature or in hosts that otherwise do not produce the 
polyketide. The technology also allows one to produce molecules that are 
structurally related to, but distinct from, the polyketides produced from known 

1 5 PKS gene clusters. 

Megalomicin is a macrolide antibiotic produced by Micromonospora 
megalomicea, a member of the Actinomycetales family of soil bacteria that 
produces many types of biologically active compounds. Megalomicin is a 
glycoside of erythromycin A, a widely used antibacterial drug with little or no 

20 antimalarial activity. Megalomicin has antibacterial properties similar to those of 
erythromycin, and in 1998, it was discovered also to have potent antiparasitic 
activity and low toxicity. The antiparasitic activity may be related to the effect 
megalomicin has on protein trafficking in eukaryotes, where it appears to inhibit 
vesicular transport between the medial and trans-Golgi, resulting in under- 

25 sialylation of proteins. Hence, megalomicin offers an exciting opportunity to 
develop a new class of antiparasitic drugs with a different mechanism of action 
than the drugs currently in use and, therefore, possibly active against drug-resistant 
forms of Plasmodium falciparum. 

The number and diversity of megalomicin derivatives have been limited 

30 due to the inability to manipulate the PKS genes, which have not previously been 
available in recombinant form. Genetic systems that allow rapid engineering of the 
megalomicin biosynthetic genes would be valuable for creating novel compounds 
for pharmaceutical, agricultural, and veterinary applications. The production of 

-» 
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such compounds could be more readily accomplished if the heterologous 
expression of the megalomicin biosynthetic genes in Streptomyces coelicolor and 
S. lividans and other host cells were possible. The present invention meets these 
and other needs. 



Summary of the Invention 
The present invention provides recombinant methods and materials for 
expressing PKS enzymes and polyketide modification enzymes derived in whole 
and in part from the megalomicin biosynthetic genes in recombinant host cells. 

10 The invention also provides the polyketides produced by such PKS enzymes. The 
invention provides in recombinant form all of the genes for the proteins that 
constitute the compjete PKS that ultimately results, in Micromonospora 
megalomicea^ in the production of megalomicin. Thus, in one embodiment, the 
invention is directed to recombinant materials comprising nucleic acids with 

15 nucleotide sequences encoding at least one domain, module, or protein encoded by 
a megalomicin PKS gene. In one preferred embodiment of the invention, the DNA 
compounds of the invention comprise a coding sequence for at least one and 
preferably two or more of the domains of the loading module find extender 
modules 1 through 6, inclusive, of the megalomicin PKS. 

20 In one embodiment, the invention provides a recombinant expression 

vector that comprises a heterologous promoter positioned to drive expression of 
one or more of the megalomicin biosynthetic genes. In a preferred embodiment, 
the promoter is derived from another PKS gene. In a related embodiment, the 
invention provides recombinant host cells comprising one or more expression 

25 vectors that produce(s) megalomicin or a megalomicin derivative or precursor. In 
a preferred embodiment, the host cell is Strepiomyces lividans or S. coelicolor. 

In another embodiment, the invention provides a recombinant expression 
vector that comprises a promoter positioned to drive expression of a hybrid PKS 
comprising all or part of the megalomicin PKS and at least a part of a second PKS. 

30 In a related embodiment, the invention provides recombinant host cells 

comprising the vector that produces the hybrid PKS and its corresponding 
polyketide. In a preferred embodiment, the host cell is Strepiomyces lividans or 5. 
coelicolor. 
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In a related embodiment, the invention provides recombinant materials for 
the production of libraries of polyketides wherein the polyketide members of the 
library are synthesized by hybrid PKS enzymes of the invention. The resulting 
polyketides can be further modified to convert them to other useful compounds, 
5 such as antibiotics, motilides, and antiparasitics, typically through hydroxylation 
and/or glycosylation. Modified macrolides provided by the invention that are 
useful intermediates in the preparation of antiparasitics are of particular benefit. 

In another related embodiment, the invention provides a method to prepare 
a nucleic acid that encodes a modified PKS, which method comprises using the 

10 megalomicin PKS encoding sequence as a scaffold and modifying the portions of 
the nucleotide sequence that encode enzymatic activities, either by mutagenesis, 
inactivation, deletion, insertion, or replacement. The thus modified megalomicin 
PKS encoding nucleotide sequence can then be expressed in a suitable host cell 
and the cell employed to produce a polyketide different from that produced by the 

1 5 megalomicin PKS. In addition, portions of the megalomicin PKS coding sequence 
can be inserted into other PKS coding sequences to modify the products thereof. 

In another related embodiment, the invention is directed to a multiplicity of 
cell colonies, constituting a library of colonies, wherein each colony of the library 
contains an expression vector for the production of a modular PKS derived in 

20 whole or in part from the megalomicin PKS. Thus, at least a portion of the 

modular PKS is identical to that found in the PKS that produces megalomicin and 
is identifiable as such. The derived portion can be prepared synthetically or 
directly from DNA derived from organisms that produce megalomicin. In 
addition, the invention provides methods to screen the resulting polyketide and 

25 antibiotic libraries. 

The invention also provides novel polyketides, motilides, antibiotics, 
antiparasitics and other useful compounds derived therefrom. The compounds of 
the invention can also be used in the manufacture of another compound. In a 
preferred embodiment, the compounds of the invention are formulated in a 

30 mixture or solution for administration to an animal or human. 

In a specific embodiment, the invention provides an isolated nucleic acid 
fragment comprising a nucleotide sequence encoding a domain of megalomicin 
polyketide synthase (PKS) or a megalomicin modification enzyme. The isolated 
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nucleic acid fragment can be a DNA or a RNA. Preferably, the isolated nucleic 
acid fragment is a recombinant DNA compound. 

The isolated nucleic acid fragment can comprise a single, multiple or all 
the open reading frame(s) (ORP) of the megaiomiciri PKS or a megalomicin 
5 modification enzyme. Exemplary ORPs of megalomicin PKS include the ORFs of 
the megAI, megAII and megAIJI genes. The isolated nucleic acid fragment can 
also encode a single, multiple, or all of the domains of the megalomicin PKS. 
Exemplary domains of the megalomicin PKS include a TE domain, a KS domain, 
an AT domain, an ACP domain, a KR domain, a DH domain and an ER domain. 
10 In a preferred embodiment, the nucleic acid fragment encodes a module of the 
megalomicin PKS. In another preferred embodiment, the nucleic acid fragment 
encodes the loading module, a thipesterase domain, and all six extender modules 
of the megalomicin PKS. 

Megalomicin modification enzymes include those enzymes involved in the 
15 conversion of 6-dEB into a megalomicin such as the enzymes encoded by the 
megF\ meg BV, megCIJI, megK, megDI and megG (renamed megY) genes. 
Megalomicin modification enzymes also include those enzymes involved in the 
biosynthesis of mycarose, megosamine or desosamine, which are used as 
biosynthetic intermediates in the biosynthesis of various megalomicin species and 
20 other related polyketides. The enzymes that are involved in biosynthesis of 
mycarose, megosamine or desosamine are described in Figures 5 and 10. 

In a preferred embodiment, the invention provides an isolated nucleic acid 
fragment which hybridizes to a nucleic acid having a nucleotide sequence set forth 
in the SFQ. ID NO:l, under low, medium or high stringency. More preferably, the 
25 nucleic acid fragment comprises, consists or consists essentially of a nucleic acid 
having a nucleotide sequence set forth in the SEQ. ID NO:l. 

In another specific embodiment, the invention provides a substantially 
purified polypeptide, which is encoded by a nucleic acid fragment comprising a 
nucleotide sequence encoding a domain of megalomicin polyketide synthase 
30 (PKS) or a megalomicin modification enzyme. The polypeptide can comprise a 
single domain, multiple domains or a full-length megalomicin PKS or 
megalomicin modification enzyme. Functional fragments, analogs or derivatives 
of the megalomicin PKS or megalomicin modification enzyme polypeptides are 

6 
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also provided. Preferably, such fragments, analogs or derivatives can be 
recognized by an antibody raised against a megalomicin PKS or megalomicin 
modification enzyme. Also preferably, such fragments, analogs or derivatives 
comprise an amino acid sequence that has at least 60% identity, more preferably at 
5 least 90% identity, to their wild type counterparts. 

In still another specific embodiment, the invention provides an antibody, or 
a fragment or derivative thereof, which immuno-specifically binds to a domain of 
megalomicin polyketide synthase (PKS) or a megalomicin modification enzyme. 
The antibody can be a monoclonal or polyclonal antibody or an antibody fragment. 

1 0 Preferably, the antibody is a monoclonal antibody. 

In yet another specific embodiment, the invention provides a recombinant 
DNA expression vector comprising the recombinant DNA compound encoding at 
least a domain of the megalomicin PKS or a megalomicin modification enzyme, 
wherein said domain is operably linked to a promoter. Preferably, the 

1 5 recombinant DNA expression vector further comprises an origin of replication or a 
segment of DNA that enables chromosomal integration. 

In yet another specific embodiment, the invention provides a recombinant 
host cell comprising the above-described recombinant DNA expression vector 
encoding at least a domain of megalomicin PKS or the megalomicin modification 

20 enzyme. The recombinant host cells can be any suitable host cells including 
animal, mammalian, plant, fungal, yeast, and bacterial cells. Preferably, the 
recombinant host cells are Streptomyces cells, such as Streptomyces lividans and 
S. coelicolor cells, or ccharopolyspora cells, such as Saccharopolyspora erythraea 
cells. Also preferably, the recombinant host cells do not produce megalomicin in 

25 their untransformed, non-recombinant state. 

When the recombinant host cell contains nucleic acid encoding more than 
one megalomicin PKS or megalomicin modification enzyme, or domains thereof, 
such nucleic acid material can be located at a single genetic locus, e.g., on a single 
plasmid or at a single chromosomal locus, or at different genetic loci, e.g., on 

30 separate plasmids and/or chromosomal loci. In one example, the invention 
provides a recombinant host cell, which comprises at least two separate 
autonomously replicating recombinant DNA expression vectors, and each of said 
vectors comprises a recombinant DNA compound encoding a megalomicin PKS 

7 
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domain or a megalomicin modification enzyme operably linked to a promoter. In 
another example, the invention provides a recombinant host cell, which comprises 
at least one autonomously replicating recombinant DNA expression vector and at 
least one modified chromosome, each of said vector(s) and each of said modified 
5 chromosome comprises a recombinant DNA compound encoding a megalomicin 
PKS domain or a megalomicin modification enzyme operably linked to a 
promoter. Preferably, the autonomously replicating recombinant DNA expression 
vector and/or the modified chromosome further comprises distinct selectable 
markers. 

10 In a preferred embodiment, the cell comprises three different vectors, one 

of which is integrated into the chromosome and two of which are autonomously 
replicating, and each of the vectors comprises a meg PKS gene. Optionally, one or 
more of the meg PKS genes contains one or more domain alterations, such as a 
deletion or substitution of a meg PKS domain with a domain from another PKS. 

15 In yet another specific embodiment, the invention provides a hybrid PKS, 

which is produced from a recombinant gene that comprises at least a portion of a 
megalomicin PKS gene and at least a portion of a second PKS gene for a 
polyketide other than megalomicin. For example, and without limitation, the 
second PKS gene can be a narbonolide PKS gene, an oleandolide PKS gene, or a 

20 rapamycin PKS gene. In one embodiment, the hybrid PKS is composed of a 

loading module and six extender modules, wherein at least one domain of any one 
of extender modules 1 through 6, inclusive, is a domain of an extender module of 
megalomicin PKS. In another preferred embodiment, the hybrid PKS comprises a 
megalomicin PKS that has a non-functional KS domain in module 1 . 

25 In yet another specific embodiment, the invention provides a method of 

producing a polyketide, which method comprises growing the recombinant host 
cell comprising a recombinant DNA expression vector encoding at least a domain 
of the megalomicin PKS or a megalomicin modification enzyme under conditions 
whereby the megalomicin PKS domain or the megalomicin modification enzyme 

30 comprised by the recombinant expression vector is produced and the polyketide is 
synthesized by the cell, and recovering the synthesized polyketide. Preferably, the 
recombinant host cell comprises a recombinant expression vector that encodes at 
least a portion of a megAI, megAII, or megAlH gene. 

8 
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These and other embodiments of the invention are described in more detail 
in the following description, the examples, and claims set forth below. 

Brief Description of the Figures 
5 Figure 1 shows restriction site and function maps of the insert DNA in 

cosmids pKOS079-138B, pKOS079-93D 5 pKOS079-93A, and pKOS079-124B of 
the invention. Various restriction sites (Xhol, BglU, Nsil) are also shown. The 
location of the megalomicin biosynthetic genes is shown below the solid lines 
indicating the cosmid inserts. The genes are shown as arrows pointing in the 

10 direction of transcription. The approximate size (in kilobase (kb) pairs) of the gene 
cluster is indicated in 5000 bp (i.e., 5K, 10K, and the like.) increments on a solid 
bar beneath the arrows indicating the genes. 

Figure 2 shows a more detailed map of the megalomicin biosynthetic gene 
cluster. The various open reading frames are shown as arrows pointing in the 

1 5 direction of transcription. A line indicates the size in base pairs (in 1000 bp 

increments) of the gene cluster. The various domains of the megalomicin PFCS are 
also shown. Other genes of the megalomicin biosynthetic gene cluster not shown 
in this Figure are located in the insert DNA of cosmids pKOSOl 38B and 
pKOS0124B. 

20 Figure 3 shows the structures of the megalomicins, azithromycin and 

erythromycin A. 

Figure 4 shows the modules and domains of DEBS and the megalomicin 

PKS. 

Figure 5 shows the compounds and reactions in the erythromycin 
25 biosynthetic pathway and also for megalomicin biosynthesis. Genes that produce 
the various enzymes that catalyze each of the steps in the biosynthetic pathway are 
indicated. 

Figure 6 shows the biosynthetic pathway for the formation of desosamine, 
rhodosamine, and mycarose, as well as the genes that produce the various enzymes 
30 that catalyze each of the steps in the biosynthetic pathway. 

Figure 7 depicts nucleotide and amino acid sequence of Micromonospora 
megalomicea megalomicin biosynthetic genes (GenBank Accession No. 
AF263245, incorporated herein by reference). 

9 
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Figure 8 depicts the biosynthesis of the erythromycins and megalomicins 
and the enzymes that mediate the biosynthesis of each. 

Figure 9 depicts the cloned megalomicin biosynthetic gene cluster and 
certain cosmids of the invention that comprise portions of the cluster. 
5 Figure 10 depicts the biosynthesis of megosarnine, mycarose, and 

desosamine. 

Detailed Description of the Invention 
The present invention provides useful compounds and methods for 

10 producing polyketides in recombinant host cells. As used herein, the term 
recombinant refers to a compound or composition produced by human 
intervention. The invention provides recombinant DNA compounds encoding all 
or a portion of the megalomicin biosynthetic genes. The invention provides 
recombinant expression vectors useful in producing the megalomicin PICS and 

1 5 hybrid PKSs composed of a portion of the megalomicin PKS in recombinant host 
cells. The invention also provides the polyketides produced by the recombinant 
PKS and polyketide modification enzymes. 

To appreciate the many'and diverse benefits and applications of the 
invention, the description of the invention below is organized as follows. In 

20 Section I, common definitions used throughout this application are provided. Tn 
Section II, structural and functional characteristics of megalomicin are described. 
In Section 111, the recombinant megalomicin biosynthetic genes and other 
recombinant nucleic acids provided by the invention are described. In Section IV, 
polypeptides and proteins encoded by the megalomicin biosynthetic genes and 

25 antibodies that specifically bind to such polypeptides and proteins provided by the 
invention are described. In Section V, methods for heterologous expression of the 
megalomicin biosynthetic genes provided by the invention are described. In 
Section VI, the hybrid PKS genes provided by the invention are described. In 
Section VII- host cells containing multiple megalomicin biosynthetic genes and 

30 nucleic acid fragments on separate express vectors provided by the invention are 
described. In Section VIII, the polyketide compounds provided by the invention 
and pharmaceutical compositions of those compounds are described. The detailed 
description is followed by working examples illustrating the invention. 

10 
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Unless defined otherwise, all technical and scientific terms used herein 
have the same meaning as is commonly understood by one of ordinary skill in the 
art to which this invention belongs. All patents, applications, published 
applications and other publications and sequences from GenBank and other data 
5 bases referred to herein are incorporated by reference in their entirety. 

Section I. Definitions 

f 

As used herein, domain refers to a portion of a molecule, e.g., proteins or 
nucleic acids, that is structurally and/or functionally distinct from another portion 
10 of the molecule. 

As used herein, antibody includes antibody fragments, such as Fab 
fragments, which are composed of a light chain and the variable region of a heavy 
chain. 

As used herein, biological activity refers to the in vivo activities of a 
1 5 compound or physiological responses that result upon in vivo administration of a 

compound, composition or other mixture. Biological activity, thus, encompasses 

therapeutic effects and pharmaceutical activity of such compounds, compositions 

and mixtures. Biological activities may be observed in in vitro systems designed 

to test or use such activities. 
20 As used herein, a combination refers to any association between two or 

among more items. 

As used herein, a composition refers to any mixture. It may be a solution, 

a suspension, liquid, powder, a paste, aqueous, non-aqueous or any combination 

thereof. 

25 As used herein, derivative or analog of a molecule refers to a portion 

derived from or a modified version of the molecule. 

As used herein, operably linked, operatively linked or operationally 
associated refers to the functional relationship of DNA with regulatory and 
effector sequences of nucleotides, such as promoters, enhancers, transcriptional 

30 and translational stop sites, and other signal sequences. For example, operative 
linkage of DNA to a promoter refers to the physical and functional relationship 
between the DNA and the promoter such that the transcription of such DNA is 
initiated from the promoter by an RNA polymerase that specifically recognizes, 

1 1 
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binds to and transcribes the DNA. To optimize expression and/or in vitro 
transcription, it may be helpful to remove, add or alter 5' untranslated portions of 
the clones to eliminate extra, potentially inappropriate alternative translation 
initiation (/.e., start) codons or other sequences that may interfere with or reduce 
5 expression, either at the level of transcription or translation. Alternatively, 

consensus ribosome binding sites (see, e.g., Kozak, J. Biol. Chem., 266: 19867- 
19870(1991)) can be inserted immediately 5' of the start codon and may enhance 
expression. The desirability of (or need for) such modification may be empirically 
determined. 

10 As used herein, pharmaceutical^ acceptable salts, esters or other 

derivatives of the conjugates include any salts, esters or derivatives that may be 
readily prepared by those of skill in this art using known methods for such 
derivatization and that produce compounds that may be administered to animals or 
humans without substantial toxic effects and that either are pharmaceutically 

1 5 active or are prodrugs. 

As used herein, a promoter region or promoter element refers to a segment 
of DNA or RNA that controls transcription of the DNA or RN A to which it is 
operatively linked. The promoter region includes specific sequences that are 
sufficient for RNA polymerase recognition, binding and transcription initiation. 

20 This portion of the promoter region is referred to as the promoter. In addition, the 
promoter region includes sequences that modulate this recognition, binding and 
transcription initiation activity of RNA polymerase. These sequences may be cis 
acting or may be responsive to trans acting factors. Promoters, depending upon 
the nature of the regulation, may be constitutive or regulated. 

25 As used herein: stringency of hybridization in determining percentage 

mismatch is as follows: (1) high stringency: 0.1 x SSPE, 0.1% SDS, 65°C; (2) 
medium stringency: 0.2 x SSPE, 0.1% SDS, 50°C; and (3) low stringency: 1.0 x 
SSPE, 0.1% SDS, 50°C. Equivalent stringencies may be achieved using alternative 
buffers, salts and temperatures. 



12 

0127284A2 I > 



WO 01/27284 



PCT/USOO/27433 



The term substantially identical or homologous or similar varies with the 
context as understood by those skilled in the relevant art and generally means at 
least 70%, preferably means at least 80%, more preferably at least 90%, and most 
preferably at least 95% identity. 
5 As used herein, substantially identical to a product means sufficiently 

similar so that the property of interest is sufficiently unchanged so that the 
substantially identical product can be used in place of the product. 

As used herein, isolated means that a substance is either present in a 
preparation at a concentration higher than that substance is found in nature or in its 

10 naturally occurring state or that the substance is present in a preparation that 

contains other materials with which the substance is not associated with in nature. 
As an example of the latter, an isolated meg PKS protein includes a meg PKS 
protein expressed in a Streptomyces coelicolor or S. lividans host cell. 

As used herein, substantially pure means sufficiently homogeneous to 

1 5 appear free of readily detectable impurities as determined by standard methods of 
analysis, such as thin layer chromatography (TLC), gel electrophoresis and high 
performance liquid chromatography (HPLC), used by those of skill in the art to 
assess such purity, or sufficiently pure such that further purification would not 
detectably alter the physical and chemical properties, such as enzymatic and 

20 biological activities, of the substance. Methods for purification of the compounds 
to produce substantially chemically pure compounds are known to those of skill in 
the art. A substantially chemically pure compound may, however, be a mixture of 
stereoisomers or isomers. In such instances, further purification might increase 
the specific activity of the compound. 
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As used herein, vector or plasmid refers to discrete elements that are used 
to introduce heterologous DNA into cells for either expression or replication 
thereof Selection and use of such vehicles are well known within the skill of the 
artisan. An expression vector includes vectors capable of expressing DNAs that 
5 are operatively linked with regulatory sequences, such as promoter regions, that 
are capable of effecting expression of such DNA fragments. Thus, an expression 
vector refers to a recombinant DNA or RNA construct, such as a plasmid, a phage, 
recombinant virus or other vector that, upon introduction into an appropriate host 
cell, results in expression of the cloned DNA. Appropriate expression vectors are 
10 well known to those of skill in the art and include those that are replicable in 

eukaryotic cells and/or prokaryotic cells and those that remain episomal or those 
which integrate into the host cell genome. 

Section II. Meualomicins 

1 5 The megalomicins were discovered in 1969 at Schering Corp. as 

antibacterial agents produced by Micromonospora megalomicea (see Weinstein et 
al., 1969, J. Antibiotics 22: 253-258, and U.S. Patent No. 3,632,750, both of 
which are incorporated herein by reference). Although the initial structural 
assignment was in error, a thorough reassessment of NMR data coupled with an 

20 X-ray crystal structure of a megalomicin A derivative (see Nakagawa and Omura, 
"Structure and Stereochemistry of Macrolides" in Macrolide Antibiotics (S. 
Omura, ed.). Academic Press, NY, 1 984, incorporated herein by reference) 
established the structures shown in Figure 3. The megalomicins are 6-0- 
glycosides of erythromycin C with acetyl or propionyl groups esterified at the 3 ! " 

25 or 4'" hydroxy Is of the mycarose sugar at the C-3-position. The C-6 sugar has 

been named "megosaminc,'* although it had been identified 5 to 10 years earlier as 
L-rhodosamine or yV-dimethyldaunosamine, deoxyamino sugars commonly present 
in the anthracycline antitumor drugs. The antibacterial potency, spectrum of 
activity, and toxicity (LD50 acute, 7-7.5 g/kg s.c. or oral; subacute, >500 mg/kg) of 

30 the megalomicins is similar to that of erythromycin A. 

The megalomicins have two modes of biological activity. As antibacterials, 
they act like the erythromycins, which inhibit protein synthesis at the translocation 
step by selective binding to the bacterial 505" ribosomal RNA. They also affect 
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protein trafficking in eukaryotic cells (see Bonay et aL, 1996, J. Biol. Chem. 
277:3719-3726, incorporated herein by reference). Although the mechanism of 
action is not entirely clear, it appears to involve inhibition of vesicular transport 
between the medial and trans Golgi, resulting in under-sialylation of proteins. The 
5 megalomicins also strongly inhibit the ATP-dependent acidification of lysosomes 
in vivo (see Bonay et aL, 1997, J. Cell. Sci. 110: 1 839-1 849, incorporated herein by 
reference) and cause an anomalous glycosylation of viral proteins, which may be 
responsible for their antiviral activity against herpes (Tox 50 , 70-100 |iM; see 
Alarcon et al. 9 1984, Antivir. Res. 4:231-243, and Alarcon et al, 1988, FEES Lett. 

1 0 231:201-2 1 1 , both of which are incorporated herein by reference). 

Strikingly, the megalomicins are potent antiparasitic agents, showing an 
iCso of 1 fig/ml in blocking intracellular replication of Plasmodium falciparum 
infected erythrocytes (see Bonay et aL, 1 998, Antimicrob. Agents Chemother. 
42:2668-2673, incorporated herein by reference). The megalomicins are effective 

1 5 against Trypanosoma cruzi and T. brucei (IC50, 0.2-2 ng/ml) plus Leishmania 
donovani and L. major promastigotes (ICso, 3 and 8 pg/ml, respectively). 
Megalomicin is also active against the intracellular replicative, amastigote form of 
T. cruzi, completely preventing its replication in infected murine LLC/MK2 
macrophages at a dose of 5 jag/ml. Importantly, the effective drug concentration is 

20 500-fold less than the acute LD50 in mammals, and there is no toxicity to BALB/c 
mice at doses (50 mg/kg) that are completely curative for T, brucei infections. 
Because the erythromycins do not have such activity, although azithromycin 
(Figure 3) has been reported to be an effective acute and prophylactic treatment for 
malaria caused by P. vivax and P. falciparum (see Taylor etaL, 1999, Clin. Infect. 

25 Dis. 28:74-81, incorporated herein by reference), the antiparasitic action of the 
megalomicins is unique and probably related to the presence of the deoxyamino 
sugar megosamine at C-6 (Figure 3). Consequently, the megalomicins could be 
developed into potent antimalarial drugs with a high therapeutic index and be 
active against P. falciparum and other species that are resistant to currently used 

30 classes of antimalarials. They also could lead to potent antiparasitic agents against 
leishmaniasis, trypanosomiasis, and Chagas' disease. In view of the widespread 
use of the erythromycins and their good oral availability plus the low mammalian 
toxicity of macrolides in general, the megalomicins could be used prophylactically 
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to combat malaria, and as fermentation products, the megalomicins should be 
relatively inexpensive to produce. . 

The megalomicins belong to the polyketide class of natural products whose 
members have diverse structural and pharmacological properties (see Monaghan 
5 and Tkacz, 1990, Annu. Rev. Microbiol. 44: 271, incorporated herein by 

reference). The megalomicins are assembled by polyketide synthases through 
successive condensations of activated coenzyme-A thioester monomers derived 
from small organic acids such as acetate, propionate, and butyrate. Active sites 
required for condensation include an acyltransferase (AT), acyl carrier protein 

10 (ACP), and beta-ketoacylsynthase (KS). Each condensation cycle results in a B- 
keto group that undergoes all, some, or none of a series of processing activities. 
Active sites that perform these reactions include a ketoreductase (KR), 
dehydratase (DH), and enoylreductase (ER). Thus, the absence of any beta-keto 
processing domain results in the presence of a ketone, a KR alone gives rise to a 

1 5 hydroxy 1, a KR and DH result in an alkene, while a KR, DH, and ER combination 
leads to complete reduction to an alkane. After assembly of the polyketide chain, 
the molecule typically undergoes cyclization(s) and post-PKS modification (e.g. 
glycosylation, oxidation, acylation) to achieve the final active compound. 

Macrolides such as erythromycin and megalomicin are synthesized by 

20 modular PKSs (see Cane et ai, 1998, Science 282: 63, incorporated herein by 
reference). For illustrative purposes, the PKS that produces the erythromycin 
polyketide (6-deoxyerythronoltde B synthase or DEBS; see U.S. Patent No. 
5,824,513, incorporated herein by reference) is shown in Figure 4. DEBS is the 
most characterized and extensively used modular PKS system. DEBS is 

25 particularly relevant to the present invention in that it synthesizes the same 

polyketide, 6-deoxyerythronolide B (6-dEB), synthesized by the megalomicin 
PKS. In modular PKS enzymes such as DEBS and the megalomicin PKS, the 
enzymatic steps for each round of condensation and reduction are encoded within 
a single "module" of the polypeptide (i.e., one distinct module for every 

30 condensation cycle). DEBS consists of a loading module and 6 extender modules 
and a chain terminating thioesterase (TE) domain within three extremely large 
polypeptides encoded by three open reading frames (ORFs, designated eryAI, 
eryAIJ, and eryAIII). 
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Each of the three polypeptide subunits of DEBS (DEBSI, DEBSII, and 
DEBSIII) contains 2 extender modules, DEBSI additionally contains the loading 
module. Collectively, these proteins catalyze the condensation and appropriate 
reduction of 1 propionyl CoA starter unit and 6 methylmalonyl CoA extender 
5 units. Modules 1,2,5, and 6 contain KR domains; module 4 contains a complete 
set, KR/DH/ER, of reductive and dehydratase domains; and module 3 contains no 
functional reductive domain. Following the condensation and appropriate 
dehydration and reduction reactions, the enzyme bound intermediate is lactonized 
by the TE at the end of extender module 6 to form 6-dEB. 

10 More particularly, the loading module of DEBS consists of two domains, 

an acyl-transferase (AT) domain and an acyl carrier protein (ACP) domain. In 
other PKS enzymes, the loading module is not composed of an AT and an ACP 
but instead utilizes an inactivated KS, an AT, and an ACP. This inactivated KS is 
in most instances called KS Q , where the superscript letter is the abbreviation for 

1 5 the amino acid, glutamine. that is present instead of the active site cysteine 
required for activity. The AT domain of the loading module recognizes a 
particular acyl-CoA (propionyl for DEBS, which can also accept acetyl) and 
transfers it as a thiol ester to the ACP of the loading module. Concurrently, the AT 
on each of the extender modules recognizes a particular extender-CoA 

20 (methylmalonyl for DEBS) and transfers it to the ACP of that module to form a 

thioester. Once the PKS is primed with acyl- and malonyl-ACPs, the acyl group of 
the loading module migrates to form a thiol ester (trans-esterification) at the KS of 
the first extender module; at this stage, extender module 1 possesses an acyl-KS 
and a methylmalonyl ACP. The acyl group derived from the loading module is 

25 then covalently attached to the alpha-carbon of the malonyl group to form a 

carbon-carbon bond, driven by concomitant decarboxylation, and generating a new 
acyl-ACP that has a backbone two carbons longer than the loading unit 
(elongation or extension). The growing polyketide chain is transferred from the 
ACP to the KS of the next module, and the process continues. 

30 The polyketide chain, growing by two carbons each module, is sequentially 

passed as a covalently bound thiol ester from module to module, in an assembly 
line-like process. The carbon chain produced by this process alone would possess 
a ketone at every other carbon atom, producing a polyketone, from which the 
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name polyketide arises. Commonly, however, the beta keto group of each two- 
carbon unit is modified just after it has been added to the growing polyketide 
chain but before it is transferred to the next module by either a KR, a KR plus a 
DH, or a KR, a DH, and an ER. As noted above, modules may contain additional 
5 enzymatic activities as well. 

Once a polyketide chain traverses the final extender module of a PKS, it 
encounters the releasing domain or thioesterase found at the carboxyl end of most 
PKSs. Here, the polyketide is cleaved from the enzyme and cyclyzed. The 
resulting polyketide can be modified further by tailoring or modification enzymes; 

10 these enzymes add carbohydrate groups or methyl groups, or make other 

modifications, i.e., oxidation or reduction, on the polyketide core molecule. For 
example, the final steps in conversion of 6-dEB to erythromycin A include the 
actions of a number of modification enzymes, such as: C-6 hydroxylation, 
attachment of mycarose and desosamine sugars, C-12 hydroxylation (which 

15 produces erythromycin C), and conversion of mycarose to cladinose via 0- 
methylation. as shown in Figure 5. 

With this overview of PKS and post-PKS modification enzymes, one can 
better appreciate the recombinant megalomicin biosynthetic genes provided by the 
invention and their function, as described in the following Section. 

20 

Section III: The Metialomicin Biosynthetic Genes and Nucleic Acid Fragments 

The megalomicin PKS was isolated and cloned by the following 
procedure. Genomic DNA was isolated from a megalomicin producing strain of 
Micromonospora megalomicea subsp. nigra (ATCC 27598), partially digested 

25 with a restriction enzyme, and cloned into a commercially available cosmid vector 
to produce a genomic library. This library was then probed with probe generated 
from the erythromycin biosynthetic genes as well as from cosmids identified as 
containing sequences homologous to erythromycin biosynthetic genes. This 
probing identified a set of cosmids, which were analyzed by DNA sequence 

30 analysis and restriction enzyme digestion, which revealed that the desired DNA 
had been isolated and that the entire PKS gene cluster was contained in 
overlapping segments on four of the cosmids identified. Figure 1 shows the 
cosmids, and the port ions of the megalomicin biosynthetic gene cluster in the 
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insert DNA of the cosmids. Figure 1 shows that the complete megalomicin 
biosynthetic gene cluster is contained within the insert DNA of cosmids 
pKOS079-138B 5 pKOS079-124B, pKOS079-93D, and pKOS079-93A. Each of 
these cosmids has been deposited with the American Type Culture Collection in 
5 accordance with the terms of the Budapest Treaty (cosmid pKOS079-138B is 

available under accession no. ATCC ; cosmid pKOS079-124B is available 

under accession no. ATCC ; cosmid pKOS079-93D is available under 

accession no. ATCC ; and cosmid pKOS079-93 A is available under 

accession no. ATCC ). Various additional reagents of the invention can be 

10 isolated from these cosmids. DNA sequence analysis was also performed on the 
various subclones of the invention, as described herein. Further analysis of these 
cosmids and subclones prepared from the cosmids facilitated the identification of 
the location of various megalomicin biosynthetic genes, including the ORFs 
encoding the PKS, modules encoded by those ORFs, and coding sequences for 

1 5 megalomicin modification enzymes. The location of these genes and modules is 
shown on Figure 2. 

Those of skill in the art will recognize that, due to the degenerate nature of 
the genetic code, a variety of DNA compounds differing in their nucleotide 
sequences can be used to encode a given amino acid sequence of the invention. 

20 The native DNA sequence encoding the megalomicin PKS and other biosynthetic 
enzymes and other biosynthetic enzymes of Micromonospora megalomicea is 
shown herein merely to illustrate a preferred embodiment of the invention, and the 
invention includes DNA compounds of any sequence that encode the amino acid 
sequences of the polypeptides and proteins of the invention. In similar fashion, a 

25 polypeptide can typically tolerate one or more amino acid substitutions, deletions, 
and insertions in its amino acid sequence without loss or significant loss of a 
desired activity. The present invention includes such polypeptides with alternate 
amino acid sequences, and the amino acid sequences encoded by the DNA 
sequences shown herein merely illustrate preferred embodiments of the invention. 

30 The recombinant nucleic acids, proteins, and peptides of the invention are 

many and diverse. To facilitate an understanding of the invention and the diverse 
compounds and methods provided thereby, the following description of the 
various regions of the megalomicin PKS and the megalomicin modification 

19 



WO 01/27284 



PCT/US00/27433 



enzymes and corresponding coding sequences is provided. To facilitate description 
of the invention, reference to a PKS, protein, module, or domain herein can also 
refer to DN A compounds comprising coding sequences therefor and vice versa. 
Also, unless otherwise indicated, reference to a heterologous PKS refers to a PKS 
5 or DNA compounds comprising coding sequences therefor from an organism 
other than Micromonospora megalomicea. In addition, reference to a PKS or its 
coding sequence includes reference to any portion thereof. 

Thus, the invention provides DNA molecules in isolated (i.e., not pure, but 
existing in a preparation in an abundance and/of concentration not found in nature) 

10 and purified (i.e., substantially free of contaminating materials or substantially free 
of materials with which the corresponding DNA would be found in nature) form. 
The DNA molecules of the invention comprise one or more sequences that encode 
one or more domains (or fragments of such domains) of one or more modules in 
one or more of the ORFs of the megalomicin PKS and sequences that encode 

15 megalomicin modification enzymes from the megalomicin biosynthetic gene 

cluster. Examples of PKS domains include the KS, AT, DH, KR, ER, ACP, and 
TE domains of at least one of the 6 extender modules and loading module of the 
three proteins encoded by the three ORFs of the megalomicin PKS gene cluster. 
Examples of megalomicin modification enzymes include those that synthesize the 

20 mycarose, desosamine, and megosamine moieties, those that transfer those sugar 
moieties to the polyketide 6-dEB, those that hydroxylate the polyketide at C-6 and 
C-12, and those that acylate the sugar moieties. 

[n an especially preferred embodiment, the DNA molecule is a 
recombinant DNA expression vector or plasmid, as described in more detail in the 

25 following Section. Generally, such vectors can either replicate in the cytoplasm of 
the host cell or integrate into the chromosomal DNA of the host cell. In either 
case, the vector can be a stable vector (i.e., the vector remains present over many 
cell divisions, even if only with selective pressure) or a transient vector (i.e., the 
vector is gradually lost by host cells with increasing numbers of cell divisions). 

30 The megalomicin PKS gene cluster comprises three ORFs (megAI, megAII, 

and megAIIF). Each ORF encodes two extender modules of the PKS; the first ORF 
also encodes the loading module. Each extender module is composed of at least a 
KS, an AT, and an ACP domain. The locations of the various encoding regions of 
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these ORFs are shown in Figure 2 and described with reference to the sequence 
information below. The megalomicin PKS produces the polyketide known as 6- 
dEB, shown in Figure 4. In megalomicin-producing organisms, 6-dEB is 
converted to erythromycin C by a set of modification enzymes. Thus, 6-dEB is 
5 converted to erythionolide B by the megF gene product (a homolog of the eryF 
gene product), then to 3-alpha-mycarosyl-erythronolide B by the megBV gene 
product (a homolog of the eryBV gene product), then to erythromycin D by the 
megCHI gene product (a homolog of the eryCIII gene product, then to 
erythromycin C by the megK gene product (a homolog of the eryK gene product). 

10 In addition to these modification enzymes, such megalomicin-producing 

organisms also contain the modification enzymes necessary for the biosynthesis of 
the desosamine and mycarose moieties that are similarly utilized in erythromycin 
biosynthesis, as shown in Figure 5. Megalomicin A contains the complete 
erythromycin C structure, and its biosynthesis additionally involves the formation 

1 5 of L-megosamine (L-rhodosamine) and its attachment to the C-6 hydroxyl 
(Figures 3 and 5, inset), followed by acylation of the C-3*" and(or) C-4"' 
hydroxyls as the terminal steps. L-megosamine is the same as Af-dimethyl-L- 
daunosamine; the daunosamine genes have been characterized from Streptomyces 
peucctius (see Colombo and Hutchinson,./. Indust. Microbiol BiotechnoL, in 

20 press; Otten et al. y 1996, J Bacleriol 178:7316-7321, and references cited therein). 
Some of the rhodosamine genes also have been cloned and partially characterized 
from another anthracycline producing Streptomyces sp. (see Torkkell et aL, 1997, 
Mol. Gen. Genet. 25<5(2):203-209). Because the timing of the glycosylation with 
TDP-megosaniine in relation to the addition of mycarose and desosamine to 

25 erythronolide B, plus the C-12 hydroxylation, is unknown, the pathway could 
involve a different order of glycosylation and C-12 hydroxylation steps than the 
one shown in Figure 5. Regardless, the megalomicin biosynthetic gene cluster 
contains the genes to make L-rhodosamine and attach it to the correct macrolide 
substrate. 

30 The biosynthetic pathways to make the glycosides desosamine, mycarose, 

and megosamine are shown in Figure 6. The present invention provides the genes 
for each biosynthetic pathway shown in this Figure, and these recombinant genetic 



WO 01/27284 



PCT/US00/27433 



pathways can be used alone or in any combination to confer the pathway to a 
heterologous host. 

The megaiomicin PKS locus is similar to the eryA locus in size and 
organization. Most of the deoxysugar biosynthesis genes are homologs of the eryB 
5 mycarose and eryC desosamine biosynthesis and glycosyl attachment genes from 
Saccharopolyspora erythraea (see Summers et al., 1997, Microbiol. 143:3251- 
3262; Haydock et aL, 1991, Mol. Gen. Genet. 230: 120-128; Gaisser et al., 1997, 
Mol Gen Genet, 256:239-251; Gaisser et al., 1998, Mol Gen Genet. 257:78-88, 
incorporated herein by reference) or the picC homologs from the picromycin and 

1 0 narbomycin producer (see PCT patent publication No. 99/61 599 and Xue et al., 
1998, Proc. Nat. Acad. Set. USA 95, 121 1 1-121 16, incorporated herein by 
reference). The TDP-megosamine biosynthesis genes are homologs of the dnm 
genes (see Figure 5) and the pikromycin N-dimethyltransferase gene or its 
homologs reported in a cluster of L-rhodosamine biosynthesis genes. The putative 

1 5 TDP-megosamine glycosyltransferase gene product (geneX in Figure 5) closely 
resembles the deduced products of the eryBV , eryCHI, dnmS, and pikromycin 
desVIl genes, even though it recognizes different substrates than the products of 
each of these genes. 



20 Micromonospora megalomicea megaiomicin biosynthetic pathway in the DNA 
sequence set forth in SEQ ID NO:l (see also Figure 7; note some gene 
designations maybe different in Figure 7). 



The following Table 1 shows the location of the genes in the 
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Table 1 . Megaiomicin Biosynthetic Gene Cluster 
Micromonospora megalomicea subsp. nigra (ATCC27598) 



Location 
1..2451 



Description 

sequence from cosmid pKOS079-138B 
megBVI (or megT), TDP-4-keto-6-deoxyglucose- 



complement(l 1 44) 
30 2,3-dehydratase 



928..2061 
2072..3382 



megDVI, TDP-4-keto-6-deoxyglucose 3,4-isomerase 
megDI, TDP-megosaminyl transferase (eryCIII 



homolog) 



2452. .40397 
35 3462. .4634 
4651. .5775 



sequence of cosmid pKOS079-93D 
megG(or megY), mycarosyl acyltransferase 
megDII^ deoxysugar transaminase (eryCI, DnrJ 
homolog) 
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5822..659S 
dimethyltransferase 

6592..7197 

5 

7220..8206 
dnmV 

complement(8228..9220) 
1 0 hexose 2,3-reductase 

complement(9226. . 1 0479) 

complement(10483..1 1424) 

12181. .22821 

12181 .13791 
15 12505.. 13470 

13576.13791 

13849.. 18207 

13849.15126 

15427.. 16476 
20 17155. .17694 

1 7947.. 18207 

1 8268..22575 

18268. 19548 

19876.20910 
25 21517..22053 

2231 8..22575 

22867.33555 

22957.-27258 

22957..24237 
30 24544..25581 

26230..26733 

26998. .27258 

2. Id 1 1 1 

27393..28590 
35 28897.2993 1 

29953. .30477 

31 396.-32244 

32257..32799 

33052..33312 
40 33666.. 43271 

33780..38120 

33780..35027 

35385. .36419 

37068-37604 
45 37860..38120 

381 87..42425 

38187..39470 

39795..40811 

40398..46641 



megDIIJ, TDP-daunosaminyl-N,N- 
(eryCVI homolog) 

megDIV, TDP-4-keto-6-deoxyglucose 3,5-epimerase 

(eryBVII, dnmU homolog) 

megDV, TDP-hexose 4-ketoreductase (eryBIV, 

homolog) 

megBIIA or megDV 'II, TDP-4-keto-L-6-deoxy- 

megBV, TDP-mycarosyl transferase 
megBIV, TDP-hexose 4-ketoreductase 
megAI 

Loading Module (L) 

AT-L 

ACP-L 

Extender Module 1(1) 

KS1 

ATI 

KR1 

ACPI 

Extender Module 2 (2) 

K.S2 

AT2 

KR2 

ACP2 

megAIl 

Extender Module 3 (3) 

KS3 

AT3 

KR3 (inactive) 
ACP3 

Extender Module 4 (4) 

KS4 

AT4 

DH4 

ER4 

KR4 

ACP4 

megAHI 

Extender Module 5 (5) 

KS5 

AT5 

KR5 

ACP5 

Extender Module 6 (6) 

KS6 

AT6 

sequences from cosmid pKOS079-93A 
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41406..41936 
42168..42425 
42585..43271 
43268..44344 
5 443S5..45623 
45620..4659 1 

complement(46660. .47403) 
compiement(4741 1.. 47980) 

10 

In a specific embodiment, the invention provides an isolated nucleic acid 
fragment comprising a nucleotide sequence encoding a domain of the 
megaiomicin polyketide synthase or a megalomicin modification enzyme. The 
isolated nucleic acid fragment can be a DNA or a RNA. Preferably, the isolated 

15 nucleic acid fragment is a recombinant DNA compound. A nucleotide sequence 
that is complementary to the nucleotide sequence encoding a domain of 
megalomicin PKS or a megalomicin modification enzyme is also provided. 

The isolated nucleic acid fragment can comprise a single, multiple or all 
the open reading frame(s) (ORF) of the megalomicin PKS or the megalomicin 

20 modification enzyme. Exemplary ORPs of megalomicin PKS include the ORFs of 
the megAI, megAll and megAIll genes. The isolated nucleic acids of the invention 
also include nucleic acids that encode one or more domains and one or more 
modules of the megalomicin PKS. Exemplary domains of the megalomicin PKS 
include a TE domain, a KS domain, an AT domain, an ACP domain, a KR 

25 domain, a DH domain and an ER domain. In a preferred embodiment, the nucleic 
acid comprises the coding sequence for a loading module, a thioesterase domain, 
and all six extender modules of the megalomicin PKS. 

Megalomicin modification enzymes include those enzymes involved in the 
conversion of 6-DEB into a megalomicin such as the enzymes encoded by tnegF, 

30 meg BV, megCIII, megK, megDI and megG (or megY), Megalomicin modification 
enzymes also include those enzymes involved in the biosynthesis of mycarose, 
megosamine or desosamine, which are used as biosynthetic intermediates in the 
biosynthesis of various megalomicin species and other related polyketides. The 
enzymes that are involved in biosynthesis of mycarose, megosamine or 

35 desosamine are described in Figures 5 and 10. The megalomicin PKS and 

megalomicin modification enzymes are collectively referred to as megalomicin 
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biosynthetic enzymes; the genes encoding such enzymes are collectively referred 
to as megalomicin biosynthetic genes; and nucleic acids that comprise a portion of 
or entire megalomicin biosynthetic genes are collectively referred to as 
megalomicin biosynthetic nucleic acid(s). 
5 In specific embodiments, the megalomicin biosynthetic nucleic acids 

comprise the sequence of SEQ ID NO:l , or the coding regions thereof, or 
nucleotide sequences encoding, in whole or in part, a megalomicin biosynthetic 
enzyme protein. The isolated nucleic acids typically consists of at least 25 
(continuous) nucleotides, 50 nucleotides, 100 nucleotides, 150 nucleotides, or 200 

10 nucleotides of megalomicin biosynthetic nucleic acid sequence, or a full-length 
megalomicin biosynthetic coding sequence. In another embodiment, the nucleic 
acids are smaller than 35, 200, or 500 nucleotides in length. Nucleic acids can be 
single or double stranded. Nucleic acids that hybridize to or are complementary to 
the foregoing sequences, in particular the inverse complement to nucleic acids that 

15 hybridize to the foregoing sequences (i.e., the inverse complement of a nucleic 
acid strand has the complementary sequence running in reverse orientation to the 
strand so that the inverse complement would hybridize without mismatches to the 
nucleic acid strand) are also provided. In specific aspects, nucleic acids are 
provided which comprise a sequence complementary to (specifically are the 

20 inverse complement of) at least 10, 25, 50, 100, or 200 nucleotides or the entire 
coding region of a megalomicin biosynthetic gene. 

The megalomicin biosynthetic nucleic acids provided herein include those 
with nucleotide sequences encoding substantially the same amino acid sequences 
as found in native megalomicin biosynthetic enzyme proteins, and those encoding 

25 amino acid sequences with functionally equivalent amino acids, as well as 

megalomicin biosynthetic enzyme derivatives or analogs as described in Section 
IV. 

Some regions within the megalomicin PKS genes are highly homologous 
or identical to one another, as can be readily identified by an analysis of the 
30 sequence. The coding sequence for the KS and AT domains of module 2 shares 
significant identity with the coding sequence for the KS and AT domains of 
module .6. This sequence homology or identity at the nucleic acid, e.g., DNA, level 
can render the nucleic acid unstable in certain host cells. To improve the stability 
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of the nucleic acids comprising a portion or the entire megalomicin PKS genes and 
megalomicin modification enzyme genes, the nucleic acid or DNA sequences can 
be changed to reduce or abolish the sequence homology or identity. Preferably, 
the DNA codons of homologous regions within the PKS or the megalomicin 
5 modification enzyme coding sequence are changed to reduce or abolish the 
sequence homology or identity without changing the amino acid sequences 
encoded by said changed DNA codons (see the examples below). The stability of 
the nucleic acid or DNA can also be improved by codon changes that reduce or 
abolish the sequence homology or identity while also changing the amino acid 

1 0 sequence, provided that the amino acid sequence change(s) does not substantially 
change the desired activity of the encoded megalomicin PKS. Thus, for example, 
one can simply substitute for the megAIII ORF an ORF from eryAIH, oleAIfl, 
picAIII, or picAlV genes. 

The recombinant DNA compounds of the invention that encode the 

15 megalomicin PKS and modification proteins or portions thereof are useful in a 

variety of applications. While many of these applications relate to the heterologous 
expression of the megalomicin biosynthetic genes or the construction of hybrid 
PKS enzymes, many useful applications involve the natural megalomicin producer 
Micromonospora mcgalomicea. For example, one can use the recombinant DNA 

20 compounds of the invention to disrupt the megalomicin biosynthetic genes by 

homologous recombination in Micromonospora megalomicea. The resulting host 
cell is a preferred host cell for making polyketides modified by oxidation, 
hydroxylation, glycosylation, and acylation in a manner similar to megalomicin, 
because the genes that encode the proteins that perform these reactions are of 

25 course present in the host cell, and because the host cell does not produce 

megalomicin that could interfere with production or purification of the polyketide 
of interest. 

One illustrative recombinant host cell provided by the present invention 
expresses a recombinant megalomicin PKS in which the module 1 KS domain is 
30 inactivated by deletion or other mutation. In a preferred embodiment, the 

inactivation is mediated by a change in the KS domain that renders it incapable of 
binding substrate (called a KS1° mutation). In a particularly preferred 
embodiment, this inactivation is rendered by a mutation in the codon for the active 
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site cysteine that changes the codon to another codon, such as an alanine codon. 
Such constructs are especially useful when placed in translational reading frame 
with extender modules 1 and 2 of a megalomicin or the corresponding modules of 
another PKS. The utility of these constructs is that host cells expressing, or cell 
5 free extracts containing, a PKS comprising the protein encoded thereby can be fed 
or supplied with N-acylcysteamine thioesters of precursor molecules to prepare a 
polyketide of interest. See U.S. patent application Serial No. 09/492,773, filed 27 
Jan. 2000, and PCT patent publication No. 00/44717, both of which are 
incorporated herein by reference. Such KS1° constructs of the invention are useful 

10 in the production of 1 3-substituted-megalomicin compounds in Micromonospora 
megalomicea host cells. Preferred compounds of the invention include those 
compounds in which the substituent at the 13-position is propyl, vinyl, propargyl, 
other lower alkyl, and substituted alkyl. 

In a variant of this embodiment, one can employ a megalomicin PKS in 

1 5 which the ACP domain of module 1 has been rendered inactive. In another 

embodiment, one can delete the loading domain of the megalomicin PKS and 
provide monoketide substrates for processing by the remainder of the PKS. 

The compounds of the invention can also be used to construct recombinant 
host cells of the invention in which coding sequences for one or more domains or 

20 modules of the megalomicin PKS or for another megalomicin biosynthetic gene 
have been deleted by homologous recombination with the Micromonospora 
megalomicea chromosomal DNA. Those of skill in the art will appreciate that the 
compounds used in the recombination process are characterized by their homology 
with the chromosomal DNA and not by encoding a functional protein due to their 

25 intended function of deleting or otherwise altering portions of chromosomal DNA. 
For this and a variety of other applications, the compounds of the present 
invention include not only those DNA compounds that encode functional proteins 
but also those DNA compounds that are complementary or identical to any portion 
of the megalomicin biosynthetic genes. 

30 Thus, the invention provides a variety of modified Micromonospora 

megalomicea host cells in which one or more of the megalomicin biosynthetic 
genes have been mutated or disrupted. Transformation systems for M. 
megalomicea have been described by Hasegawa et al. 9 1991 , J. Bacteriol. 
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/ 73 :7004-l 1; and Takada et aL, 1994, J. Antibiot. 47 \\ 167-1 170, both of which 
are incorporated herein by reference. These cells are especially useful when it is 
desired to replace the disrupted function with a gene product expressed by a 
recombinant DNA expression vector. While such expression vectors of the 
5 invention are described in more detail in the following Section, those of skill in 
the art will appreciate that the vectors have application to M. megalomicea as well. 
Such M megalomicea host cells can be preferred host cells for expressing 
megalomicin derivatives of the invention. Particularly preferred host cells of this 
type include those in which the coding sequence for the loading module has been 
10 mutated or disrupted, those in which one or more of any of the PKS gene ORFs 
has been mutated or disrupted, and/or those in which the genes for one or more 
modification (glycosylation, acylation, hydroxylation) have been mutated or 
disrupted. 

While the present invention provides many useful compounds having 
15 application to, and recombinant host cells derived from, Micromdnospora 

megalomicea, many important applications of the present invention relate to the 
heterologous expression of all or a portion of the megalomicin biosynthetic genes 
in cells other than M. megalomicea , as described in Section V. 

20 Section IV: The Megalomicin Biosynthetic Enzymes and Antibodies Recognizing 
such Enzymes 

In another specific embodiment, the invention provides a substantially 
purified polypeptide, which is encoded by a nucleic acid fragment comprising a 
nucleotide sequence encoding a domain of megalomicin polyketide synthase 

25 (PKS) or a megalomicin modification enzyme. The polypeptide can comprise a 
single domain, multiple domains or a full-length megalomicin PKS or 
megalomicin modification enzyme. Functional fragments, analogs or derivatives 
of the megalomicin PKS or megalomicin modification enzyme polypeptides are 
also provided. Preferably, such fragments, analogs or derivatives can be 

30 recognized an antibody raised against a megalomicin PKS or megalomicin 

modification enzyme. Also preferably, such fragments, analogs or derivatives 
comprise an amino acid sequence that has at least 60% identity, more preferably at 
least 90% identity to their wild type counterparts. 
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An exemplary nucleotide sequence encoding, and the corresponding amino 
acid sequence of, a megalomicin biosynthetic enzyme is disclosed in SEQ ID 
NO: 1 . Homologs (e.g., nucleic acids of the above-listed genes of species other 
than Micromonospora megalomicea) or other related sequences {e.g., paralogs) 
5 can be obtained by low, moderate or high stringency hybridization with all or a 
portion of the particular sequence provided as a probe using methods well known 
in the art for nucleic acid hybridization and cloning (e.g., as described in Section 
III) in accordance with the methods of the present invention. 

The megalomicin biosynthetic enzyme proteins, or domains thereof, of the 

10 present invention can be obtained by methods well known in the art for protein 
purification and recombinant protein expression in accordance with the methods 
of the present invention. For recombinant expression of one or more of the 
proteins, the nucleic acid containing all or a portion of the nucleotide sequence 
encoding the protein can be inserted into an appropriate expression vector, i.e., a 

1 5 vector that contains the necessary elements for the transcription and translation of 
the inserted protein coding sequence. Transcriptional and translational signals can 
be supplied by the native promoter for a megalomicin biosynthetic gene and/or 
flanking regions. 

A variety of host-vector systems may be utilized to express the protein 
20 coding sequence. These include but are not limited to mammalian cell systems 
infected with virus (e.g. vaccinia virus, adenovirus, and the like); insect cell 
systems infected with virus (e.g. baculovirus); microorganisms such as yeast 
containing yeast vectors; or bacteria transformed with bacteriophage, DNA, 
plasmid DNA, or cosmid DNA. The expression elements of vectors vary in their 
25 properties. Depending on the host-vector system utilized, any one of a number of 
suitable transcription and translation elements may be used. 

In a speci fic embodiment, a vector is used that comprises a promoter 
operably linked to nucleic acid sequences encoding a megalomicin biosynthetic 
enzyme, or a domain, fragment, derivative or homolog, thereof, one or more 
30 origins of replication, and optionally, one or more selectable markers (e.g., an 
antibiotic resistance gene). 

Expression vectors containing the sequences of interest can be identified 
by three general approaches: (a) nucleic acid hybridization, (b) presence or 
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absence of "marker" gene function, and (c) expression of the inserted sequences. 
In the first approach, megalornicin biosynthetic nucleic acid sequences can be 
detected by nucleic acid hybridization to probes comprising sequences 
homologous and complementary to the inserted sequences. In the second 
5 approach, the recombinant vector/host system can be identified and selected based 
upon the presence or absence of certain "marker" functions (e.g., binding to an 
anti-megalomicin biosynthetic enzyme antibody, resistance to antibiotics, 
occlusion body formation in baculovirus, and the like) caused by insertion of the 
sequences of interest in the vector. For example, if a megalornicin biosynthetic 

10 gene, or portion thereof, is inserted within the marker gene sequence of the vector, 
recombinants containing the megalornicin biosynthetic gene fragment will be 
identified by the absence of the marker gene function. In the third approach, 
recombinant expression vectors can be identified by assaying for the megalornicin 
biosynthetic gene products expressed by the recombinant vector. Such assays can 

1 5 be based, for example, on the physical or functional properties of the interacting 
species in in vitro assay systems, e.g., megalornicin synthesis activity, 
immunoreactivity to antibodies specific for the protein. 

Once recombinant megalornicin biosynthetic genes or nucleic acids are 
identified, several methods known in the art can be used to propagate them in 

20 accordance with the methods of the present invention. Once a suitable host 
system and growth conditions have been established, recombinant expression 
vectors can be propagated and amplified in quantity. As previously described, the 
expression vectors or derivatives which can be used include, but are not limited to: 
human or animal viruses such as vaccinia virus or adenovirus; insect viruses such 

25 as baculovirus, yeast vectors; bacteriophage vectors such as lambda phage; and 
plasmid and cosmid vectors. 

In addition, a host cell strain may be chosen that modulates the expression 
of the inserted sequences, or modifies or processes the expressed proteins in the 
specific fashion desired. Expression from certain promoters can be elevated in the 

30 presence of certain inducers; thus expression of the genetically-engineered 

megalornicin biosynthetic enzymes may be controlled. Furthermore, different host 
cells have characteristic and specific mechanisms for the translational and post- 
translational processing and modification (e.g. glycosylation, phosphorylation, and 
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the like) of proteins. Appropriate cell lines or host systems can be chosen to 
ensure the desired modification and processing of the foreign protein is achieved. 
For example, expression in a bacterial system can be used to produce an 
unglycosylated core protein, while expression in mammalian cells ensures 
5 "native" glycosylation of a heterologous protein. Furthermore, different 

vector/host expression systems may effect processing reactions to different extent. 

In particular, megalomicin biosynthetic enzyme derivatives can be made by 
altering their sequences by substitutions, additions or deletions that provide for 
functionally equivalent molecules. Due to the degeneracy of nucleotide coding 

10 sequences, other DNA sequences which encode substantially the same amino acid 
sequence as an megalomicin biosynthetic gene can be used in the practice of the 
present invention. These include but are not limited to nucleotide sequences 
comprising all or portions of megalomicin biosynthetic genes that are altered by 
the substitution of different codons that encode the amino acid residue within the 

15 sequence, thus producing a silent change. Likewise, the megalomicin biosynthetic 
enzyme derivatives of the invention include, but are not limited to, those 
containing, as a primary amino acid sequence, all or part of the amino acid 
sequence of megalomicin biosynthetic enzymes, including altered sequences in 
which functionally equivalent amino acid residues are substituted for residues 

20 within the sequence resulting in a silent change. For example, one or more amino 
acid residues within the sequence can be substituted by another amino acid of a 
similar polarity which acts as a functional equivalent, resulting in a silent 
alteration. Substitutes for an amino acid within the sequence may be selected 
from other members of the class to which the amino acid belongs. For example, 

25 the nonpolar (hydrophobic) amino acids include alanine, leucine, isoleucine, 
valine, proline, phenylalanine, tryptophan and methionine. The polar neutral 
amino acids include glycine, serine, threonine, cysteine, tyrosine, asparagine, and 
glutamine. The positively charged (basic) amino acids include arginine, lysine and 
histidine. The negatively charged (acidic) amino acids include aspartic acid and 

30 glutamic acid. 

In a specific embodiment of the invention, the nucleic acids encoding 
proteins and proteins consisting of or comprising a domain or a fragment of 
megalomicin biosynthetic enzyme consisting of at least 6 (continuous) amino 
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acids are provided. In other embodiments, the domain or fragment consists of at 
least 10, 20, 30, 40, or 50 amino acids of a megalomicin biosynthetic enzyme. In 
specific embodiments, such domains or fragments are not larger than 35, 100 or 
200 amino acids. Derivatives or analogs of megalomicin biosynthetic enzyme 
5 include but are not limited to molecules comprising regions that are substantially 
homologous to megalomicin biosynthetic enzyme in various embodiments, at least 
30%, 40%, 50%, 60%, 70%, 80%, 90% or 95% identity over an amino acid 
sequence of identical size or when compared to an aligned sequence in which the 
alignment is done by a computer homology program known in the art in 

1 0 accordance with the methods of the present invention or whose encoding nucleic 
acid is capable of hybridizing to a sequence encoding a megalomicin biosynthetic 
enzyme under stringent, moderately stringent, or nonstringent conditions. 

The megalomicin biosynthetic enzyme domains, derivatives and analogs of 
the invention can be produced by various methods known in the art in accordance 

15 with the methods of the present invention. The manipulations which result in their 
production can occur at the gene or protein level. For example, the cloned 
megalomicin biosynthetic gene sequence can be modified by any of numerous 
strategies known in the art (Sambrook et al., 1990, Molecular Cloning, A 
Laboratory Manual, 2d ed., Cold Spring Harbor Laboratory, Cold Spring Harbor, 

20 New York) in accordance with the methods of the present invention. The 

sequences can be cleaved at appropriate sites with restriction endonuclease(s), 
followed by further enzymatic modification if desired, isolated, and ligated in 
vitro. 

Additionally, the megalomicin biosynthetic enzyme-encoding nucleotide 
25 sequence can be mutated in vitro or in vivo, to create and/or destroy translation, 
initiation, and/or termination sequences, or to create variations in coding regions 
and/or form new restriction endonuclease sites or destroy pre-existing ones, to 
facilitate further in vitro modification. Any technique for mutagenesis known in 
the art can be used in accordance with the methods of the present invention, 
30 including but not limited to, chemical mutagenesis and in vitro site-directed 
mutagenesis (Hutchinson et ah, J. Biol Chem. 253:6551-6558 (1978)), use of 
TAB® linkers (Pharmacia), and the like. 
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Once a recombinant cell expressing a megalomicin biosynthetic enzyme 
protein, or a domain, fragment or derivative thereof, is identified, the individual 
gene product can be isolated and analyzed. This is achieved by assays based on 
the physical and/or functional properties of the protein, including, but not limited 
5 to, radioactive labeling of the product followed by analysis by gel electrophoresis, 
immunoassay, cross-linking to marker-labeled product, and the like. 

The megalomicin biosynthetic enzyme proteins may be isolated and 
purified by standard methods known in the art or recombinant host cells 
expressing the complexes or proteins in accordance with the methods of the 

10 invention, including but not restricted to column chromatography (e.g., ion 

exchange, affinity, gel exclusion, reversed-phase high pressure, fast protein liquid, 
and the like), differential centrifugation, differential solubility, or by any other 
standard technique used for the purification of proteins. Functional properties 
may be evaluated using any suitable assay known in the art in accordance with the 

1 5 methods of the present invention. 

Alternatively, once a megalomicin biosynthetic enzyme or its domain or 
derivative is identified, the amino acid sequence of the protein can be deduced 
from the nucleotide sequence of the gene which encodes it. As a result, the 
protein or its domain or derivative can be synthesized by standard chemical 

20 methods known in the art in accordance with the methods of the present invention 
(see HimkapilleretaL M:/a//-<? 310: 105-1 11 (1984)). 

Manipulations of megalomicin biosynthetic enzymes may be made at the 
protein level. Included within the scope of the invention are megalomicin 
biosynthetic enzyme domains, derivatives or analogs or fragments, which are 

25 differentially modified during or after translation, e.g., by glycosylation, 
acetylation, phosphorylation, amidation, derivatization by known 
protecting/blocking groups, proteolytic cleavage, linkage to an antibody molecule 
or other cellular ligand, and the like. Any of numerous chemical modifications 
may be carried out by known techniques, including but not limited to specific 

30 chemical cleavage by cyanogen bromide, trypsin, chymotrypsin, papain, V8 
protease, NaBRj, acetylation, formylation, oxidation, reduction, metabolic 
synthesis in the presence of tunicamycin, and the like. 
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In specific embodiments, the megalomicin biosynthetic enzymes are 
modified to include a fluorescent label. In other specific embodiments, the 
megalomicin biosynthetic enzyme is modified to have a heterofunctional reagent, 
such heterofunctional reagents can be used to crosslink the members of the 
5 complex. 

In addition, domains, analogs and derivatives of a megalomicin 
biosynthetic enzyme can be chemically synthesized. For example, a peptide 
corresponding to a portion of a megalomicin biosynthetic enzyme, which 
comprises the desired domain or which mediates the desired activity in vitro can 

10 be synthesized by use of a peptide synthesizer. Furthermore, if desired, 

nonclassical amino acids or chemical amino acid analogs can be introduced as a 
substitution or addition into the megalomicin biosynthetic enzyme sequence. 
Non-classical amino acids include but are not limited to the D-isomers of the 
common amino acids, alpha-ami no isobutyric acid, 4-aminobutyric acid, 

15 2-aminobutyric acid, 6-amino hexanoic acid, Aib, 2-amino isobutyric acid, 
3-amino propionoic acid, ornithine, norleucine, norvaline, hydroxyproline, 
sarcosine, citrulline, cysteic acid, t-butylglycine, t-butylalanine, phenylglycine, 
cyclohexylalanine, B-alanine, fluoro-amino acids, designer amino acids such as B- 
methyl amino acids, Ca-methyl amino acids, Na-methyl amino acids, and amino 

20 . acid analogs in general. Furthermore, the amino acid can be D (dextrorotary) or L 
(levorotary). 

In cases where natural products are suspected of being mutant or are 
isolated from new species, the amino acid sequence of the megalomicin 
biosynthetic enzyme isolated from the natural source, as well as those expressed in 

25 vitro, or from synthesized expression vectors in vivo or in vitro^ can be determined 
from analysis of the DNA sequence, or alternatively, by direct sequencing of the 
isolated protein. Such analysis may be performed by manual sequencing or 
through use of an automated amino acid sequenator. 

The megalomicin biosynthetic enzyme proteins may also be analyzed by 

30 hydrophilicity analysis (Hopp and Woods, Proc. Nail. Acad. Sci. USA 78:3824- 
3828 (1981)). A hydrophilicity profile can be used to identify the hydrophobic 
and hydrophilic regions of the proteins, and help predict their orientation in 
designing substrates for experimental manipulation, such as in binding 

34 

OOCID: <WO 0127284A2_I_> 



WO 01/27284 



PCTYUS00/27433 



experiments, antibody synthesis, and the like. Secondary structural analysis can 
also be done to identify regions of the megalomicin biosynthetic enzyme that 
assume specific structures (Chou and Fasman, Biochemistry 13:222-23 (1974)). 
Manipulation, translation, secondary structure prediction, hydrophilicity and 
5 hydrophobicity profiles, open reading frame prediction and plotting, and 

determination of sequence homologies, can be accomplished using computer 
software programs available in the art. 

Other methods of structural analysis including but not limited to X-ray 
crystallography (Engstrom, Biochem. Exp. Biol. l_j_:7-13 (1974)), mass 
10 spectroscopy and gas chromatography (Methods in Protein Science, J. Wiley and 
Sons, New York, 1997), and computer modeling (Fletterick and Zoller, eds., 1986, 
Computer Graphics and Molecular Modeling, In: Current Communications in 
Molecular Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor Press, 
New York) can also be employed. 
15 The invention also provides an antibody, or a fragment or derivative 

thereof, which immuno-speciflcally binds to a domain of megalomicin polyketide 
synthase (PKS) or a megalomicin modification enzyme. In a specific 
embodiment, an antibody which irnmuno-specifically binds to a domain of the 
megalomicin biosynthetic enzyme encoded by a nucleic acid that hybridizes to a 
20 nucleic acid having the nucleotide sequence set forth in the SEQ. ID NO: 1 , or a 
fragment or derivative of said antibody containing the binding domain thereof is 
provided. Preferably, the antibody is a monoclonal antibody. 

The megalomicin biosynthetic enzyme protein and domains, fragments, 
homologs and derivatives thereof may be used as immunogens to generate 
25 antibodies which immunospecifically bind such immunogens. Such antibodies 
include but are not limited to polyclonal, monoclonal, chimeric, single chain, Fab 
fragments, and an Fab expression library. 

Various procedures known in the art may be used for the production of 
polyclonal antibodies to a megalomicin biosynthetic enzyme protein of the 
30 invention, its domains, derivatives, fragments or analogs in accordance with the 
methods of the present invention. 

For production of the antibody, various host animals can be immunized by 
injection with the native megalomicin biosynthetic enzyme protein or a synthetic 
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version, or a derivative of the foregoing, such as a cross-linked megalomicin 
biosynthetic enzyme. Such host animals include but are not limited to rabbits, 
mice, rats, and the like. Various adjuvants can be used to increase the 
immunological response, depending on the host species, and include but are not 
5 limited to Freund's (complete and incomplete), mineral gels such as aluminum 
hydroxide, surface active substances such as lysolecithin, pluronic polyols, 
polyanions, peptides, oil emulsions, dinitrophenol, and potentially useful human 
adjuvants such as bacille Calmette-Guerin (BCG) and corynebacterium parvum. 

For preparation of monoclonal antibodies directed towards a megalomicin 

10 biosynthetic enzyme or domains, derivatives, fragments or analogs thereof, any 
technique that provides for the production of antibody molecules by continuous 
cell lines in culture may be used. Such techniques include but are not restricted to 
the hybridoma technique originally developed by Kohler and Milstein {Nature 
256 :495-497 (1975)), the trioma technique, the human B-cell hybridoma technique 

1 5 (Kozbor et al., Immunology Today 4:72 (1983)), and the EBV hybridoma 

technique to produce human monoclonal antibodies (Cole et al., in Monoclonal 
Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96 (1985)). In an 
additional embodiment, monoclonal antibodies can be produced in germ-free 
animals (WO89/12690). Human antibodies may be used and can be obtained by 

20 using human hybridonias (Cote et al., Proc. Mad. Acad. Sci. USA 80:2026-2030 
(1983)) or by transforming human B cells with EBV virus in vitro (Cole et ah, in 
Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96 
(1985)). Techniques developed for the production of "chimeric antibodies" 
(Morrison et al., Proc. Natl. Acad. Sci USA 81:6851-6855 (1984); Neuberger et 

25 al., Nature 312:604-608 (1984); Takeda et al., Nature 314:452-454 (1985)) by 
splicing the genes from a mouse antibody molecule specific for the megalomicin 
biosynthetic enzyme protein together with genes from a human antibody molecule 
of appropriate biological activity can be used; such antibodies are within the scope 
of this invention. 

30 Techniques described for the production of single chain antibodies (U.S. 

patent 4,946,778) can be adapted to produce megalomicin biosynthetic enzyme- 
specific single chain antibodies. An additional embodiment utilizes the techniques 
described for the construction of Fab expression libraries (Huse et al., Science 
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246 :1275-1281 (1989)) to allow rapid and easy identification of monoclonal Fab 
fragments with the desired specificity for megalomicin biosynthetic enzyme, or 
domains, derivatives, or analogs thereof. Non-human antibodies can be 
"humanized" by known methods (see, e.g., U.S. Patent No. 5,225,539). 
5 Antibody fragments that contain the idiotypes of a megalomicin 

biosynthetic enzyme can be generated by techniques known in the art in 
accordance with the methods of the present invention. For example, such 
fragments include but are not limited to: the F(ab')2 fragment which can be 
produced by pepsin digestion of the antibody molecule; the Fab' fragments that 

10 can be generated by reducing the disulfide bridges of the F(ab')2 fragment, the Fab 
fragments that can be generated by treating the antibody molecular with papain 
and a reducing agent, and Fv fragments. 

In the production of antibodies, screening for the desired antibody can be 
accomplished by techniques known in the art in accordance with the methods of 

15 the present invention, e.g., ELISA (enzyme-linked immunosorbent assay). To 
select antibodies specific to a particular domain of the megalomicin biosynthetic 
enzyme, one may assay generated hybridomas for a product that binds to the 
fragment of a megalomicin biosynthetic enzyme that contains such a domain. 

The foregoing antibodies can be used in methods known in the art relating 

20 to the localization and/or quantitation of megalomicin biosynthetic enzyme 

proteins, e.g., for imaging these proteins or measuring levels thereof in samples, in 
accordance with the methods of the present invention. 

Section V: Heterologous Expression of the Megalomicin Biosynthetic Genes 
25 In one important embodiment, the invention provides methods for the 

heterologous expression of one or more of the megalomicin biosynthetic genes 
and recombinant DNA expression vectors useful in the method. For purposes of 
the invention, any host cell other than Micromonospora megalomicea is a 
heterologous host cell. Thus, included within the scope of the invention in 
30 addition to isolated nucleic acids encoding domains, modules, or proteins of the 
megalomicin PKS and modification enzymes, are recombinant expression vectors 
that include such nucleic acids. The term expression vector refers to a nucleic acid 
that can be introduced into a host cell or cell-free transcription and translation 
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system. An expression vector can be maintained permanently or transiently in a 
cell, whether as part of the chromosomal or other DNA in the cell or in any 
cellular compartment, such as a replicating vector in the cytoplasm. An expression 
vector also comprises a promoter that drives expression of an RNA, which 
5 typically is translated into a polypeptide in the cell or cell extract. For efficient 
translation of RNA into protein, the expression vector also typically contains a 
ribosome-binding site sequence positioned upstream of the start codon of the 
coding sequence of the gene to be expressed. Other elements, such as enhancers, 
secretion signal sequences, transcription termination sequences, and one or more 

1 0 marker genes by which host cells containing the vector can be identified and/or 
selected, may also be present in an expression vector. Selectable markers, i.e., 
genes that confer antibiotic resistance or sensitivity, are preferred and confer a 
selectable phenotype on transformed cells when the cells are grown in an 
appropriate selective medium. 

1 5 The various components of an expression vector can vary widely, 

depending on the intended use of the vector and the host cell(s) in which the 
vector is intended to replicate or drive expression. Expression vector components 
suitable for the expression of genes and maintenance of vectors in £. coli, yeast, 
Streptomyces, and other commonly used cells are widely known and commercially 

20 available. For example, suitable promoters for inclusion in the expression vectors 
of the invention include those that function in eucaryotic or procaryotic host cells. 
Promoters can comprise regulatory sequences that allow for regulation of 
expression relative to the growth of the host cell or that cause the expression of a 
gene to be turned on or off in response to a chemical or physical stimulus. For E. 

25 colt and certain other bacterial host cells, promoters derived from genes for 
biosynthetic enzymes, antibiotic-resistance conferring enzymes, and phage 
proteins can be used and include, for example, the galactose, lactose {lac), 
maltose, tryptophan (trp\ beta- lactamase bacteriophage lambda PL, and T5 
promoters. In addition, synthetic promoters, such as the tac promoter (U.S. Patent 

30 No. 4,55 1 ,433), can also be used. 

Thus, recombinant expression vectors contain at least one expression 
system, which, in turn, is composed of at least a portion of the megalomicin PKS 
and/or other megalomicin biosynthetic gene coding sequences operably linked to a 
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promoter and optionally termination sequences that operate to effect expression of 
the coding sequence in compatible host cells. The host cells are modified by 
transformation with the recombinant DNA expression vectors of the invention to 
contain the expression system sequences either as extrachromosomal elements or 

5 integrated into the chromosome. The resulting host cells of the invention are 

useful in methods to produce PKS and post-PKS modification enzymes as well as 
polyketides and antibiotics and other useful compounds derived therefrom. 

Preferred host cells for purposes of selecting vector components for 
expression vectors of the present invention include fungal host cells such as yeast 

10 and procaryotic host cells such as E. coli and Streptomyces, but mammalian host 
cells can also be used. In hosts such as yeasts, plants, or mammalian cells that 
ordinarily do not produce polyketides, it may be necessary to provide, also 
typically by recombinant means, suitable holo-ACP synthases to convert the 
recombinantly produced PKS to functionality. Provision of such enzymes is 

1 5 described, for example, in PCT publication Nos. WO 97/13845 and 98/27203, 

each of which is incorporated herein by reference. Particularly preferred host cells 
for purposes of the present invention are Streptomyces and Saccharopolyspora 
host cells, as discussed in greater detail below. 

In a preferred embodiment, the expression vectors of the invention are 

20 used to construct a heterologous recombinant Streptomyces host cell that expresses 
a recombinant PKS of the invention. Streptomyces is a convenient host for 
expressing polyketides, because polyketides are naturally produced in certain 
Streptomyces species, and Streptomyces cells generally produce the precursors 
needed to form the desired polyketide. Those of skill in the art will recognize that, 

25 if a Streptomyces host cell produces any portion of a PKS enzyme or produces a 
polyketide modification enzyme, the recombinant vector need drive expression of 
only those genes constituting the remainder of the desired PKS enzyme or other 
polyketide-modifying enzymes. Thus, such a vector may comprise only a single 
ORF, with the desired remainder of the polypeptides constituting the PKS 

30 provided by the genes on the host cell chromosomal DNA. 

If a Streptomyces or other host cell ordinarily produces polyketides, it may 
be desirable to modify the host so as to prevent the production of endogenous 
polyketides prior to its use to express a recombinant PKS of the invention. Such 
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modified hosts include S. coelicolor CH999 and similarly modified S. lividans 
described in U.S. Patent No. 5,672,491, and PCT publication Nos. WO 95/08548 
and WO 96/40968, incorporated herein by reference. In such hosts, it may not be 
necessary to provide enzymatic activities for all of the desired post-translational 
5 modifications of the enzymes that make up the recombinantly produced PKS, 
because the host naturally expresses such enzymes. In particular, these hosts 
generally contain hoIo-ACP synthases that provide the phosphopantotheinyl 
residue needed for functionality of the PKS. 

The invention provides a wide variety of expression vectors for use in 

10 Streptomyces. The replicating expression vectors of the present invention include, 
for example and without limitation, those that comprise an origin of replication 
from a low copy number vector, such as SCP2* (see Hopwood et aL, Genetic 
Manipulation of Streptomyces: A Laboratory manual (The John Innes Foundation, 
Norwich, U.K., 1985); Lydiatc et aL 9 1985, Gene 35: 223-235; and Kieser and 

15 Melton, 1988, Gene 65: 83-91, each of which is incorporated herein by reference), 
SLP1.2 (Thompson ct aL, 1982, Gene 20: 51-62, incorporated herein by 
reference), and pSG5(ts) (Muth et al., 1989, Mol Gen. Genet. 219: 341-348, and 
Bierman et al.^ 1992, Gene 116: 43-49, each of which is incorporated herein by 
reference), or a high copy number vector, such as pi J 101 and pJVl (see Katz et 

20 a/., 1983,./ Gen. Microbiol. 129: 2703-2714; Vara el ah, 1989,./. Bacteriol. 171: 
5782-5781; and Servin-Gonzalez, 1993, Plasm id 30: 131-140, each of which is 
incorporated herein by reference). For non-replicating and integrating vectors and 
generally for any vector, it is useful to include at least an E. colt origin of 
replication, such as from pUC, plP, pi I, and pBR. For phage based vectors, the 

25 phage phiC3 1 and its derivative KC515 can be employed (see Hopwood et aL, 
supra). Also, plasmid pSFTI 52, plasmid pSAM, plasmids pSElOl and pSE21 1, 
all of which integrate site-specifically in the chromosomal DN A of S. lividans, can 
be employed for purposes of the present invention. 

The Streptomyces recombinant expression vectors of the invention 

30 typically comprise one or more selectable markers, including antibiotic resistance 
conferring genes selected from the group consisting of the ermE (confers 
resistance to erythromycin and lincomycin), tsr (confers resistance to 
thiostrepton), aadA (confers resistance to spectinomycin and streptomycin), aacC4 
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(confers resistance to apramycin, kanamycin, gentamicin, geneticin (G418), and 
neomycin), hyg (confers resistance to hygromycin), and vph (confers resistance to 
viomycin) resistance conferring genes. Alternatively, several polyketides are 
naturally colored, and this characteristic can provide a built-in marker for 
5 identifying cells. 

Megalomicins are currently produced only by the relatively genetically 
intractable host Micromonospora megalomicinea. This bacteria has not been 
commonly used in the fermentation industry for the large-scale production of 
antibiotics, and methods for high level production of megalomicin and its analogs 

10 are needed. In contrast, the streptomycete bacteria have been widely used for 
almost 50 years and are excellent hosts for production of megalomicin and its 
analogs. Streptomyces lividans and S. coelicolor have been developed for the 
expression of heterologous PKS systems. These organisms can stably maintain 
cloned heterologous PKS genes, express them at high levels under controlled 

15 conditions, and modify the corresponding PKS proteins (e.g., 

phosphopantotheinylation) so that they are capable of production of the polyketide 
they encode. Furthermore, these hosts contain the necessary pathways to produce 
the substrates required for polyketide synthesis; e.g. propionyl-CoA and 
methylmalonyl-CoA. A wide variety of cloning and expression vectors are 

20 available for these hosts, as are methods for the introduction and stable 

maintenance of large segments of foreign DNA. Relative to Micromonospora spp., 
S. lividans and S. coelicolor grow well on a number of media and have been 
adapted for high level production of polyketides in fermentors. If production levels 
are low, a number of rational approaches are available to improve yield (see 

25 Hosted and Baltz, 1996, Trends BiotechnoL 74(7):245-50, incorporated herein by 
reference). Empirical methods to increase the titers of these macrolides, long since 
proven effective for numerous bacterial polyketides, can also be employed. 

Preferred Streptomyces host cell/vector combinations of the invention 
include S. coelicolor CH999 and S. lividans K4-1 14 host cells, which have been 

30 modified so as not to produce the polyketide actinorhodin, and expression vectors 
derived from the pRMl and pRM5 vectors, as described in U.S. Patent Nos. 
5,830,750 and 6,022,731 and U.S. patent application Serial No. 09/181,833, filed 
28 Oct. 1998, each of which is incorporated herein by reference. These vectors are 
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particularly preferred in that they contain promoters compatible with numerous 
and diverse Streptomyces spp. Particularly useful promoters for Streptomyces host 
cells include those from PKS gene clusters that result in the production of 
polyketides as secondary metabolites, including promoters from aromatic (Type II) 
5 PKS gene clusters. Examples of Type II PKS gene cluster promoters are act gene 
promoters and tern gene promoters; an example of a Type I PKS gene cluster 
promoter are the promoters of the spiramycin PKS genes and DEBS genes. The 
present invention also provides the megalomicin biosynthetic gene promoters in 
recombinant form. These promoters can be used to drive expression of the 

1 0 megalomicin biosynthetic genes or any other coding sequence of interest in host 
cells in which the promoter functions, particularly Micromonaspora megalomicea 
and generally any Streptomyces species. 

As described above, particularly useful control sequences are those that 
alone or together with suitable regulatory systems activate expression during 

15 transition from growth to stationary phase in the vegetative mycelium. The 
promoter contained in the aforementioned plasmid pRM5, i.e., the actl/actlll 
promoter pair and the actII-ORF4 activator gene, is particularly preferred. Other 
useful Streptomyces promoters include without limitation those from the ermE 
gene and the melCi gene, which act constitutively, and the tipA gene and the merA 

20 gene, which can be induced at any growth stage. In addition, the T7 RNA 

polymerase system has been transferred to Streptomyces and can be employed in 
the vectors and host cells of the invention. In this system, the coding sequence for 
the T7 RNA polymerase is inserted into a neutral site of the chromosome or in a 
vector under the control of the inducible merA promoter, and the gene of interest is 

25 placed under the control of the T7 promoter. As noted above, one or more 
activator genes can also be employed to enhance the activity of a promoter. 
Activator genes in addition to the actII~ORF4 gene described above include dnrl, 
redD, and ptpA genes (see U.S. patent application Serial No. 09/181,833, supra). 
To provide a preferred host cell and vector for purposes of the invention, 

30 the megalomicin biosynthetic genes are placed on a recombinant expression vector 
and transferred to the non-macrolide producing hosts Streptomyces lividans K4- 
1 14 and S. coelicolor CH999. Transformation of S. lividans K4-1 14 or S. 
coelicolor CH999 with this expression vector results in a strain which produces 
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detectable amounts of megalomicin as determined by analysis of extracts by 
LC/MS. As noted above, the present invention also provides recombinant DNA 
compounds in which the encoded megalomicin module 1 KS domain is 
inactivated (the KS1° mutation). The introduction into Streptomyces lividans or S. 
5 coelicolor of a recombinant expression vector of the invention that encodes a 
megalomicin PKS with a KSl° domain produces a host cell useful for making 
polyketides by a process known as diketide feeding. The resulting host cells can be 
fed or supplied with N-acylcysteamine thioesters of precursor molecules to 
prepare megalomicin derivatives. Such cells of the invention are especially useful 

10 in the production of 1 3-substituted-6-deoxyerythronolide B compounds in 
recombinant host cells. Preferred compounds of the invention include those 
compounds in which the substituent at the 13-position is propyl, vinyl, propargyl, 
other lower alkyl, and substituted alkyl. In a preferred embodiment, the meg PKS 
is produced from a recombinant construct in which the megAIII gene has been 

1 5 altered to abolish the regions of identical coding sequence it otherwise shares with 
the megAl gene, or a hybrid PKS is employed in which the megAIII gene product 
has been replaced by the oleAIII gene product. Recombinant oleAIII genes are 
described in, for example, PCT patent publication No. 00/026349 and U.S. patent 
application Serial No. 09/428,5 1 7, filed 28 Oct. 1999, both of which are 

20 incorporated herein by reference. 

The recombinant host cells of the invention can express all of the 
megalomicin biosynthetic genes or only a subset of the same. For example, if only 
the genes for the megalomicin PKS are expressed in a host cell that otherwise does 
not produce polyketide modifying enzymes that can act on the polyketide 

25 produced, then the host cell produces unmodified polyketides, called macrolide 
aglycones. Such macrolide aglycones can be hydroxylated and glycosylated by 
adding them to the fermentation of a strain such as, for example, Streptomyces 
aniihioticus or Saccharopolyspora erythraea, that contains the requisite 
modification enzymes. 

30 There are a wide variety of diverse organisms that can modify macrolide 

aglycones to provide compounds with, or that can be readily modified to have, 
useful activities. For example, as shown in Figure 5, Saccharopolyspora eryihraea 
can convert 6-dEB to a variety of useful compounds. The erythronolide 6-dEB is 
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converted by the eryF gene product to erythronolide B, which is, in turn, 
glycosylated by the eryBV gene product to obtain 3-O-mycarosylerythronoIide B, 
which contains L-mycarose at C-3. The eryCIII gene product then converts this 
compound to erythromycin D by glycosylation with D-desosamine at C-5. 
5 Erythromycin D, therefore, differs from 6-dEB through glycosylation and by the 
addition of a hydroxyl group at C-6. Erythromycin D can be converted to 
erythromycin B in a reaction catalyzed by the eryG gene product by methylating 
the L-mycarosc residue at C-3. Erythromcyin D is converted to erythromycin C by 
the addition of a hydroxyl group at C- 12 in a reaction catalyzed by the eryK gene 

10 product. Erythromycin A is obtained from erythromycin C by methyiation of the 
mycarose residue in a reaction catalyzed by the eryG gene product. The 
unmodified megalomicin compounds provided by the present invention, such as, 
for example, the 6-dEB or 6-dEB analogs, produced in Streptomyces lividans, can 
be provided to cultures of S. erythraea and converted to the corresponding 

15 derivatives of erythromycins A, B, C, and D in accordance with the procedure 
provided in the examples below. To ensure that only the desired compound is 
produced, one can use an S. erythraea eryA mutant that is unable to produce 6- 
dEB but can still carry out the desired conversions (Weber et aL, 1985, J. 
Bacteriol. 164(\)\ 425-433). Also, one can employ other mutant strains, such as 

20 eryB, eryC, eryG, and/or eryK mutants, or mutant strains having mutations in 

multiple genes, to accumulate a preferred compound. The conversion can also be 
carried out in large fermcntors for commercial production. 

Moreover, there are other useful organisms that can be employed to 
hydroxylate and/or glycosylate the compounds of the invention. As described 

25 above, the organisms can be mutants unable to produce the polyketide normally 

produced in that organism, the fermentation can be carried out on plates or in large 
fermentors, and the compounds produced can be chemically altered after 
fermentation. Thus, Streptomyces venezuelae, which produces picromycin, 
contains enzymes that can transfer a desosaminyl group to the C-5 hydroxyl and a 

30 hydroxyl group to the C-12 position. In addition, S. venezuelae contains a 

glucosylation activity that glucosylates the 2'-hydroxyl group of the desosamine 
sugar. This latter modification reduces antibiotic activity, but the glucosyl residue 
is removed by enzymatic action prior to release of the polyketide from the cell. 
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Another organism, S. narbonensis, contains the same modification enzymes as 5. 
venezuelae, except the C-12 hydroxylase. Thus, the present invention provides the 
compounds produced by hydroxylation and glycosylation of the macrolide 
aglycones of the invention by action of the enzymes endogenous to S. narbonensis 
5 and S. venezuelae. 

Other organisms suitable for making compounds of the invention include 
Micromonospora megalomicea (discussed above), Streptomyces antibioticus, S. 
fradiae, and S. thermololerans. S. antibioticus produces oleandomycin and 
contains enzymes that hydroxylate the C-6 and C-12 positions, glycosylate the C-3 

10 hydroxyl with olcandrose and the C-5 hydroxyl with desosamine, and form an 
epoxide at C-8-C-8a. S. fradiae contains enzymes that glycosylate the C-5 
hydroxyl with rnycaminose and then the 4 5 -hydroxyl of mycaminose with 
mycarose, forming a disaccharide. S. thermololerans contains the same activities 
as S. fradiae, as well as acylation activities. Thus, the present invention provides 

1 5 the compounds produced by hydroxylation and glycosylation of the macrolide 

aglycones of the invention by action of the enzymes endogenous to S. antibioticus, 
S. fradiae, and 5. thermololerans . 

The present invention also provides methods and genetic constructs for 
producing the glycosylated and/or hydroxylated compounds of the invention 

20 directly in the host cell of interest. Thus, the recombinant genes of the invention, 
which include recombinant megAl, megAII, and megAIII genes with one or more 
deletions and/or insertions, including replacements of a megA gene fragment with 
a gene fragment from a heterologous PKS gene (as discussed in the next Section), 
can be included on expression vectors suitable for expression of the encoded gene 

25 products in Saccharopolyspora erythraea, Streptomyces antihioticus, S. 

venezuelae, S. narbonensis, Micromonospora megalomicea, S. fradiae, and S. 
thermololerans. 

A number of erythromycin high-producing strains of Saccharopolyspora 
erythraea and Streptomyces fradiae have been developed, and in a preferred 
30 embodiment, the megalomicin PKS and/or other megalomicin biosynthetic genes 
are introduced into such strains (or erythromycin non-producing mutants thereof) 
to provide the corresponding modified megalomicin compounds in high yields. 
Those of skill in the art will appreciate that S. erythraea contains the desosamine 

45 



WO 01/27284 PCT/US00/27433 

and mycarose biosynthetic and transfer genes as well as DEBS, which, as noted 
above, makes the same macrolide aglycone, 6-dEB, as the megalomicin PKS. S. 
eryihraea does not make megosamine or its corresponding transferase gene, and 
does not contain the acylation gene of Micromonospora megalomicea. Finally, the 
5 S. erythraea eryG gene product converts mycarose to cladinose, which does not 
occur in M. megalomicea. Thus, the present invention provides a wide variety of 
S. erythraea recombinant host celts, including, for example, those that contain: 

(i) wild-type erythromycin biosynthetic genes with recombinant 
megosamine biosynthetic and transfer genes, with and without megalomicin 

10 acylation genes; 

(ii) wild-type erythromycin biosynthetic genes except eryG, with 
recombinant megosamine biosynthetic and transfer genes, with and without 
megalomicin acylation genes; and 

(iii) as in (i) and (ii), except that the eryA genes are inactive or deleted and 
15 recombinant megA genes have been introduced. 

The invention provides other S. erythraea strains as well, including those 
in which any one or more of the erythromycin biosynthetic genes have been 
deleted or otherwise rendered inactive and in which at least one megalomicin 
biosynthetic gene has been introduced. 

20 For example, the present invention enables one to express the megosamine 

genes in a Saccharopolyspora erythraea eryG mutant in which the erythromycin C 
made by this mutant is converted to megalomicin A. Alternatively, one could use 
an erythromycin C high -producing strain of S. erythraea in biotransformation 
methods in which the erythromycin C is fed to a Streptomyces lividans strain 

25 carrying only the megosamine biosynthesis and glycosyl transferase genes. As 
another alternative, one could use a strain of S. lividans that carries suitable 
erythromycin production genes along with the daunosamine biosynthesis genes 
plus geneX and gene Y of Figure 5, or all of the megosamine biosynthesis genes, to 
produce megalomicin A. 

30 All or some of the megalomicin gene cluster can be easily cloned under 

control of a suitable promoter in pCK7 or pSET 1 52 either in one or two plasmids 
and introduced into the Saccharopolyspora erythraea eryG mutant. The actll- 
ORF4/actIp system and the phiC3 Mint system in pSET function well in this 
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organism (see Rowe et aL 9 1998, Gene, 216:215-23, incorporated herein by 
reference). Alternatively, the megosamine biosynthesis genes are introduced into 
Streptomyces lividans on the same plasmids and the production of megalomicin A 
or its precursor mediated by byconversion, done by feeding erythronolide B, 3- 
5 alpha-mycarosylerythronolide B, erythromycin D or erythromycin C to the S. 
lividans strain- 
Lack of adequate resistance to megalomicin A in S. eryihraea or 5, 
lividans is not expected, because both organisms have MLS resistance genes 
{ermE and mgt/lrm> respectively), which confer resistance to several 14-membered 

10 macroiides (see Cundliffe, 1989, Annu. Rev. Microbiol. 43:207-33; Jenkins and 
Cundliffe, 1991, Gene 108:55-62; and Cundliffe, 1992, Gene, 775:75-84, each of 
which is incorporated herein by reference). One can also readily determine the 
level of resistance of the S. erythraea eryG mutant and the S. lividans host cells to 
megalomicin A, both in plate tests and in liquid medium. One can repeat the 

15 bioconversion method using an eryG mutant of a high erythromycin A producing 
S erythraea strain (or an eryB or eryC mutant, as necessary) to determine the level 
at which megalomicin A can be produced. Furthermore, if experience shows that 
high level megalomicin A production requires a higher level of resistance to this 
macrolide than present in S. erythraea or S. lividans, the necessary megalomicin 

20 self-resistance genes will be cloned from M. megalomicea and moved into either 
one of the heterologous hosts. This will be straightforward work since self- 
resistance genes are usually found in the cluster of macrolide biosynthesis genes 
and can be identified by their homology to known macrolide resistance genes 
and(or) by the resistance phenotype they impart to a strain that normally is 

25 sensitive. 

Alternatively, geneX and geneY (Figure 5) can be added to cassettes 
containing the relevant daunosamine (dnm) biosynthesis genes (Figure 5) to 
provide the ability to make TDP-megosamine in vivo and attach it to an 
erythromycin algycone. The TDP-daunosamine biosynthesis genes can be re- 
30 cloned from Streptomyces peucetius on two compatible and mutually selectable 
plasmids. When an S. lividans strain containing these two plasmids and the dnmS 
gene for TDP-daunosamine glycosyltransferase is grown in the presence of added 
epsilon-rhodomycinone, its glycoside with L-daunosamine, called rhodomycin D, 
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is produced in good yield. Thus, byconversion of one of the erythromycins to 
megalomicin A should be observed when geneX and geneY are present. One can 
construct all five combination - the two Af-dimethyltransferase genes and the three 
glycosyltransferase genes - to discriminate geneX and geneY from those connected 
5 with mycarose and desosamine biosynthesis and attachment in the megalomicin 
pathway. 

Because the timing of megosamine addition is unknown, one can test 
erythronolide B, 3-alpha-mycarosylerythronolide B, erythromycin D and 
erythromycin C as substrates provided to a strain that expresses the megosamine 

10 biosynthetic and transferase genes. There is need to test the C3'" and(or) C4'" 
acylated metabolites like megalomicin CI, because these metabolites are made 
from megalomicin A and not the converse, based on the precedents in the 
biosynthesis of tylosin (see Arisawa et aL, 1994, Appl. Environ. Microbiol 60: 
2657-2661), carbomycin (see Epp et aL, 1989, Gene 55:293-301), and 

15 midecamycin (see Hara and Hutchinson, 1992, J. Bacteriol 174, 5141-5144). If 
C-6 glycosylation of erythronolide B or 3-alpha-mycarosylerythronolide B (Figure 
5) happens before addition of desosamine to C-5, then the erythromycin genes 
might not be able to complete formation of megalomicin A from some mono or 
diglycoside if the erythromycin glycosyl transferases cannot tolerate a C-6 

20 glycoside. Although unexpected, such an outcome could be circumvented in 
accordance with the methods of the invention by cloning further megalomicin 
biosynthesis genes into the appropriate S. erythraea background or into S. lividans 
— specifically, the necessary deoxysugar biosynthesis and attachment genes - to 
create a recombinant strain that produces megalomicin A. 

25 The acyl transferase gene that adds acetate or propionate to the C3'" or 

C4*" positions of mycarose in megalomicin B, CI and C2 (Figure 3) is contained 
within the cosmids of the invention and can be identified by scanning the sequence 
data for the megalomicin gene cluster to locate homologs of carE and mdmB or 
their acyA homologs from the tylosin producer. The carE and acyA genes govern 

30 C4'" acylation in the carbomycin and tylosin pathway, respectively. The 

megalomicin homolog has the equivalent function in megalomicin biosynthesis 
(but is specific for C3'" and C4'" acylation). The gene can be cloned under 
control of a suitable promoter and introduced into 5. lividans to produce the 
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desired acyl derivative of megalomicin A. Alternatively, introduction of the carE 
gene can form megalomicin B. This gene can be cloned from the carbomycin, 
spiramycin or tylosin producers. 

If the amount of megalomicin produced by an S. etythraea or S. lividans or 
5 other recombinant host cell is less than desired, yield can be improved by 
optimizing the growth medium and fermentation conditions, by increasing 
expression of the gene(s) that appear to be rate limiting, based on the level of 
pathway intermediates that are accumulated by the recombinant strain constructed, 
and by reconstructing the ery, dnm, and megalomicin biosynthesis genes on 

10 vectors like pSETl 52 that can be integrated into the genome to provide a stabler 
recombinant strain for strain improvement. 

In another embodiment, the present invention provides recombinant 
vectors encoding one or more of the megosamine, desosamine, and mycarose 
biosynthetic and transfer genes and heterologous host cells comprising those 

15 vectors. In this embodiment of the invention, the heterologous host cell is typically 
a cell that is unable to produce the sugar and transfer it to a polyketide unless the 
vector of the invention is introduced. For example, neither Streptomyces lividans 
norS. coelicolor is naturally capable of making megosamine, desosamine, or 
mycarose or transferring those moieties to a polyketide. However, the present 

20 invention provides recombinant Streptomyces lividans and S. coelicolor host cells 
that are capable of making megosamine, desosamine, and/or mycarose and 
transferring those moieties to a polyketide. 

Moreover, additional recombinant gene products can be expressed in the 
host cell to improve production of a desired polyketide. As but one non-limiting 

25 example, certain of the recombinant PICS proteins of the invention may produce a 
polyketide other than or in addition to the predicted polyketide, because the 
polyketide is cleaved from the PKS by the thioesterase (TE) domain in module 6 
prior to processing by other domains on the PKS, in particular, any KR, DH, 
and/or ER domains in module 6. The production of the predicted polyketide can 

30 be increased in such instances by deleting the TE domain coding sequences from 
the gene and, optionally, expressing the TE domain as a separate protein. See 
Gokhale et al. y Feb. 1999, "Mechanism and specificity of the terminal thioesterase 
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domain from the erythromycin polyketide synthase," Chem. & Biol 6: 1 17-125, 
incorporated herein by reference. 

Thus, in one important aspect, the present invention provides methods, 
expression vectors, and recombinant host cells that enable the production of 
5 megalomicin and hydroxylated and glycosylated derivatives of megalomicin in 
heterologous host cells. The present invention also provides methods for making a 
wide variety of polyketides derived in part from the megalomicin PKS or other 
biosynthetic genes, as described in the following Section. 

10 Section VI: Hybrid PKS Genes 

The present invention provides recombinant DNA compounds encoding 
each of the domains of each of the modules of the megalomicin PKS as well as the 
other megalomicin biosynthetic enzymes. The availability of these compounds 
permits their use in recombinant procedures for production of desired portions of 

1 5 the megalomicin PKS fused to or expressed in conjunction with all or a portion of 
a heterologous PKS and, optionally, one or more polyketide modification 
enzymes. These compounds also permit the modification of polyketides with the 
various megalomicin modification enzymes. The resulting hybrid PKS can then be 
expressed in a host cell to produce a desired polyketide or modified form thereof 

20 Thus, in accordance with the methods of the invention, a portion of the 

megalomicin biosynthetic gene coding sequence that encodes a particular activity 
can be isolated and manipulated, for example, to replace the corresponding region 
in a different modular PKS gene or modification enzyme gene. In addition, coding 
sequences for individual proteins, modules, domains, and portions thereof of the 

25 megalomicin PKS can be ligated into suitable expression systems and used to 

produce the portion of the protein encoded. The resulting protein can be isolated 
arid purified or can may be employed in situ to effect polyketide synthesis. 
Depending on the host for the recombinant production of the domain, module, 
protein, or combination of proteins, suitable control sequences such as promoters, 

30 termination sequences, enhancers, and the like are ligated to the nucleotide 

sequence encoding the desired protein in the construction of the expression vector, 
as described above. 
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In one important embodiment, the invention thus provides hybrid PKS 
enzymes and the corresponding recombinant DNA compounds that encode those 
hybrid PKS enzymes. For purposes of the invention, a hybrid PKS is a 
recombinant PKS that comprises all or part of one or more extender modules, 
5 loading module, and/or thioesterase/cyclase domain of a first PKS and all or part 
of one or more extender modules, loading module, and/or thioesterase/cyclase 
domain of a second PKS. In one preferred embodiment, the first PKS is most but 
not all of the megalomicin PKS, and the second PKS is only a portion of a non- 
megalomicin PKS. An illustrative example of such a hybrid PKS includes a 

10 megalomicin PKS in which the megalomicin PKS loading module has been 

replaced with a loading module of another PKS. Another example of such a hybrid 
PKS is a megalomicin PKS in which the AT domain of extender module 3 is 
replaced with an AT domain that binds only malonyl CoA. In another preferred 
embodiment, the first PKS is most but not all of a non-megalomicin PKS, and the 

1 5 second PKS is only a portion of the megalomicin PKS. An illustrative example of 
such a hybrid PKS includes a rapamycin PKS in which an AT specific for malonyl 
CoA is replaced with the AT from the megalomicin PKS specific for 
methylmalonyl CoA. Other illustrative hybrid PKSs of the invention are described 
below. 

20 Those of skill in the art will recognize that all or part of either the first or 

second PKS in a hybrid PKS of the invention need not be isolated from a naturally 
occurring source. For example, only a small portion of an AT domain determines 
its specificity. See PCT patent application No. WO US99/15047, and Lau et aL, 
infra, incorporated herein by reference. The state of the art in DNA synthesis 

25 allows the artisan to construct de novo DNA compounds of size sufficient to 
construct a useful portion of a PKS module or domain. Thus, the desired 
derivative coding sequences can be synthesized using standard solid phase 
synthesis methods such as those described by Jaye et al., 1984, J. Biol. Chem. 259: 
6331, and instruments for automated synthesis are available commercially from, 

30 for example, Applied Biosystems, Inc. For purposes of the invention, such 
synthetic DNA compounds are deemed to be a portion of a PKS. 

With this general background regarding hybrid PKSs of the invention, one 
can better appreciate the benefit provided by the DNA compounds of the invention 
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that encode the individual domains, modules, and proteins that comprise the 
megalomicin PKS. As described above, the megalomicin PKS is comprised of a 
loading module, six extender modules composed of a KS, AT, ACP, and zero, 
one, two. or three KR, DH, and ER domains, and a thioesterase domain. The DNA 
5 compounds of the invention that encode these domains individually or in 
combination are useful in the construction of the hybrid PKS encoding DNA 
compounds of the invention. For example, a DNA compound of the invention that 
encodes an extender module or portion of an extender module is useful in the 
construction of a coding sequence that encodes a protein subcomponent of a PKS. 

1 0 The DNA compound of the invention that comprises a coding sequence of a PKS 
subunit protein is useful in the construction of an expression vector that drives 
expression of the subunii in a host cell that expresses the other subunits and so 
produces a functional PKS. 

The recombinant DNA compounds of the invention that encode the 

1 5 loading module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS 
loading module is inserted into a DNA compound that comprises the coding 
sequence for one or more heterologous PKS extender modules. The resulting 

20 construct, in which the coding sequence for the loading module of the 

heterologous PKS is replaced by that for the coding sequence of the megalomicin 
PKS loading module provides a novel PKS. Examples include the DEBS, 
rapamycin, FK-506, FK-520, rifamycin, and avermectin PKS coding sequences. In 
another embodiment, a DNA compound comprising a sequence that encodes the 

25 megalomicin PKS loading module is inserted into a DNA compound that 
comprises the coding sequence for the megalomicin PKS or a recombinant 
megalomicin PKS that produces a megalomicin derivative. 

In another embodiment, a portion of the loading module coding sequence 
is utilized in conjuction with a heterologous coding sequence. In this embodiment, 

30 the invention provides, for example, replacing the methylmalonyl CoA (propionyl) 
specific AT with a malonyl CoA (acetyl), ethylmalonyl CoA (butyryl), or other 
CoA specific AT. In addition, the AT and/or ACP can be replaced by another AT 
and/or another ACP or an inactivated KS, such as a KS Q , an AT, and/or another 
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ACP. The resulting heterologous loading module coding sequence can be utilized 
in conjunction with a coding sequence for a PKS that synthesizes megalomicin, a 
megalomicin derivative, or another polyketide. 

The recombinant DNA compounds of the invention that encode the first 
5 extender module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS first 
extender module is inserted into a DNA compound that comprises the coding 
sequence for a heterologous PKS. The resulting construct, in which the coding 

1 0 sequence for a module of the heterologous PKS is either replaced by that for the 
first extender module of the megalomicin PKS or the latter is merely added to 
coding sequences for modules of the heterologous PKS, provides a novel PKS 
coding sequence. In another embodiment, a DNA compound comprising a 
sequence that encodes the first extender module of the megalomicin PKS is 

1 5 inserted into a DNA compound that comprises coding sequences for the 
megalomicin PKS or a recombinant megalomicin PKS that produces a 
megalomicin derivative. 

In another embodiment, a portion or all of the first extender module coding 
sequence is utilized in conjunction with other PKS coding sequences to create a 

20 hybrid module. In this embodiment, the invention provides, for example, replacing 
the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl CoA, or 
2-hydroxymalonyl CoA specific AT; deleting (which includes inactivating) the 
KR; inserting a DH or a DH and ER; and/or replacing the KR with another KR, a 
DH and KR, or a DH, KR, and ER. In addition, the KS and/or ACP can be 

25 replaced with another KS and/or ACP. In each of these replacements or insertions, 
the heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate 
from a coding sequence for another module of the megalomicin PKS, from a gene 
for a PKS that produces a polyketide other than megalomicin, or from chemical 
synthesis. The resulting heterologous first extender module coding sequence can 

30 be utilized in conjunction with a coding sequence for a PKS that synthesizes 
megalomicin, a megalomicin derivative, or another polyketide. 

Those of skill in the art will recognize, however, that deletion of the KR 
domain of extender module 1 or insertion of a DH domain or DH and KR domains 
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into extender module 1 will prevent the typical cyciization of the polyketide at the 
hydroxy! group created by the KR if such hybrid module is employed as a first 
extender module in a hybrid PKS or is otherwise involved in producing a portion 
of the polyketide at which cyciization is to occur. Such deletions or insertions can 
5 be useful, however, to create linear molecules or to induce cyciization at another 
site in the molecule. 

As noted above, the invention also provides recombinant PKSs and 
recombinant DNA compounds and vectors that encode such PKSs in which the 
KS domain of the first extender module has been inactivated. Such constructs are 

10 typically expressed in translational reading frame with the first two extender 
modules on a single protein, with the remaining modules and domains of a 
megalomicin, megalomicin derivative, or hybrid PKS expressed as one or more, 
typically two, proteins to form the multi-protein functional PKS. The utility of 
these constructs is that host cells expressing, or cell free extracts containing, the 

15 PKS encoded thereby can be fed or supplied with N-acylcysteamine thioesters of 
precursor molecules to prepare megalomicin derivative compounds. See U.S. 
patent application Serial No. 09/492,733, filed 27 Jan. 2000, and PCT publication 
Nos. WO 00/44717, 99/03986 and 97/02358, each of which is incorporated herein 
by reference. 

20 The recombinant DNA compounds of the invention that encode the second 

extender module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS 
second extender module is inserted into a DNA compound that comprises the 

25 coding sequence for a heterologous PKS. The resulting construct, in which the 
coding sequence for a module of the heterologous PKS is either replaced by that 
for the second extender module of the megalomicin PKS or the latter is merely 
added to coding sequences for the modules of the heterologous PKS, provides a 
novel PKS. In another embodiment, a DNA compound comprising a sequence that 

30 encodes the second extender module of the megalomicin PKS is inserted into a 

DNA compound that comprises the coding sequences for the megalomicin PKS or 
a recombinant megalomicin PKS that produces a megalomicin derivative. 
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In another embodiment, a portion or all of the second extender module 
coding sequence is utilized in conjunction with other PKS coding sequences to 
create a hybrid module. In this embodiment, the invention provides, for example, 
replacing the methyl malonyl Co A specific AT with a malonyl Co A, ethylmalonyl 
5 CoA, or 2-hydroxymalonyl CoA specific AT; deleting (or inactivating) the KR; 
replacing the KR with a KR, a KR and a DH 5 or a KR, DH, and ER; and/or 
inserting a DH or a DH and an ER. In addition, the KS and/or ACP can be 
replaced with another KS and/or ACP. In each of these replacements or insertions, 
the heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate 

10 from a coding sequence for another module of the megalomicin PKS, from a 

coding sequence for a PKS that produces a polyketide other than megalomicin, or 
from chemical synthesis. The resulting heterologous second extender module 
coding sequence can be utilized in conjunction with a coding sequence from a 
PKS that synthesizes megalomicin, a megalomicin derivative, or another 

15 polyketide. 

The recombinant DNA compounds of the invention that encode the third 
extender module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS third 

20 extender module is inserted into a DNA compound that comprises the coding 
sequence for a heterologous PKS. The resulting construct, in which the coding 
sequence for a module of the heterologous PKS is either replaced by that for the 
third extender module of the megalomicin PKS or the latter is merely added to 
coding sequences for the modules of the heterologous PKS, provides a novel PKS. 

25 In another embodiment, a DNA compound comprising a sequence that encodes 
the third extender module of the megalomicin PKS is inserted into a DNA 
compound that comprises coding sequences for the megalomicin PKS or a 
recombinant megalomicin PKS that produces a megalomicin derivative. 

In another embodiment, a portion or all of the third extender module 

30 coding sequence is utilized in conjunction with other PKS coding sequences to 
create a hybrid module. In this embodiment, the invention provides, for example, 
replacing the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl 
CoA, or 2-hydroxymalonyl CoA specific AT; deleting the inactive KR; and/or 
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replacing the KR with an active KR, or a KR and DH, or a KR, DH, and ER. In 
addition, the KS and/or ACP can be replaced with another KS and/or ACP. In 
each of these replacements or insertions, the heterologous KS, AT, DH, KR, ER, 
or ACP coding sequence can originate from a coding sequence for another module 
5 of the megalomicin PKS, from a gene for a PKS that produces a polyketide other 
than megalomicin, or from chemical synthesis. The resulting heterologous third 
extender module coding sequence can be utilized in conjunction with a coding 
sequence for a PKS that synthesizes megalomicin, a megalomicin derivative, or 
another polyketide. 

1 0 The recombinant DNA compounds of the invention that encode the fourth 

extender module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS fourth 
extender module is inserted into a DNA compound that comprises the coding 

1 5 sequence for a heterologous PKS. The resulting construct, in which the coding 
sequence for a module of the heterologous PKS is either replaced by that for the 
fourth extender module of the megalomicin PKS or the latter is merely added to 
coding sequences for the modules of the heterologous PKS, provides a novel PKS. 
In another embodiment, a DNA compound comprising a sequence that encodes 

20 the fourth extender module of the megalomicin PKS is inserted into a DNA 
compound that comprises coding sequences for the megalomicin PKS or a 
recombinant megalomicin PKS that produces a megalomicin derivative. 

In another embodiment, a portion of the fourth extender module coding 
sequence is utilized in conjunction with other PKS coding sequences to create a 

25 hybrid module. In this embodiment, the invention provides, for example, replacing 
the methylmalonyl CoA specific AT with a malonyl CoA ? ethylmalonyl CoA, or 
2-hydroxymalonyl CoA specific AT; deleting or inactivating any one, two, or all 
three of the ER, DH, and KR; and/or replacing any one, two, or all three of the ER, 
DH, and KR with either a KR, a DH and KR, or a KR, DH, and ER. In addition, 

30 the KS and/or ACP can be replaced with another KS and/or ACP. In each of these 
replacements or insertions, the heterologous KS, AT, DH, KR, ER, or ACP coding 
sequence can originate from a coding sequence for another module of the 
megalomicin PKS (except for the DH and ER domains), from a coding sequence 
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for a PKS that produces a polyketide other than megalomicin, or from chemical 
synthesis. The resulting heterologous fourth extender module coding sequence can 
be utilized in conjunction with a coding sequence for a PKS that synthesizes 
megalomicin, a megalomicin derivative, or another polyketide. 

5 The recombinant DNA compounds of the invention that encode the fifth 

extender module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS fifth 
extender module is inserted into a DNA compound that comprises the coding 

10 sequence for a heterologous PKS. The resulting construct, in which the coding 
sequence for a module of the heterologous PKS is either replaced by that for the 
fifth extender module of the megalomicin PKS or the latter is merely added to 
coding sequences for the modules of the heterologous PKS, provides a novel PKS. 
In another embodiment, a DNA compound comprising a sequence that encodes 

1 5 the fifth extender module of the megalomicin PKS is inserted into a DNA 

compound that comprises the coding sequence for the megalomicin PKS or a 
recombinant megalomicin PKS that produces a megalomicin derivative. 

In another embodiment, a portion or all of the fifth extender module 
coding sequence is utilized in conjunction with other PKS coding sequences to 

20 create a hybrid module. In this embodiment, the invention provides, for example, 
replacing the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl 
CoA, or 2-hydroxymalonyl CoA specific AT; deleting (or inactivating) the KR; 
inserting a DH or a DH and ER; and/or replacing the KR with another KR, a DH 
and KR, or a DH, KR, and ER. In addition, the KS and/or ACP can be replaced 

25 with another KS and/or ACP. In each of these replacements or insertions, the 

heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate from a 
coding sequence for another module of the megalomicin PKS, from a coding 
sequence for a PKS that produces a polyketide other than megalomicin, or from 
chemical synthesis. The resulting heterologous fifth extender module coding 

30 sequence can be utilized in conjunction with a coding sequence for a PKS that 
synthesizes megalomicin, a megalomicin derivative, or another polyketide. 

The recombinant DNA compounds of the invention that encode the sixth 
extender module of the megalomicin PKS and the corresponding polypeptides 
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encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS sixth 
extender module is inserted into a DNA compound that comprises the coding 
sequence for a heterologous PKS. The resulting construct, in which the coding 
5 sequence for a module of the heterologous PKS is either replaced by that for the 
sixth extender module of the megalomicin PKS or the latter is merely added to 
coding sequences for the modules of the heterologous PKS, provides a novel PKS. 
In another embodiment, a DNA compound comprising a sequence that encodes 
the sixth extender module of the megalomicin PKS is inserted into a DNA 

10 compound that comprises the coding sequences for the megalomicin PKS or a 
recombinant megalomicin PKS that produces a megalomicin derivative. 

In another embodiment, a portion or all of the sixth extender module 
coding sequence is utilized in conjunction with other PKS coding sequences to 
create a hybrid module. In this embodiment, the invention provides, for example, 

1 5 replacing the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl 
CoA, or 2-hydroxymalonyl CoA specific AT; deleting or inactivating the KR or 
replacing the KR with another KR, a KR and DH, or a KR, DH, and an ER; and/or 
inserting a DH or a DH and ER. In addition, the KS anchor ACP can be replaced 
with another KS and/or ACP. In each of these replacements or insertions, the 

20 heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate from a 
coding sequence for another module of the megalomicin PKS, from a coding 
sequence for a PKS that produces a polyketide other than megalomicin, or from 
chemical synthesis. The resulting heterologous sixth extender module coding 
sequence can be utilized in conjunction with a coding sequence for a PKS that 

25 synthesizes megalomicin, a megalomicin derivative, or another polyketide. 

The sixth extender module of the megalomicin PKS is followed by a 
thioesterase domain. This domain is important in the cyclization of the polyketide 
and its cleavage from the PKS. The present invention provides recombinant DNA 
compounds that encode hybrid PKS enzymes in which the megalomicin PKS is 

30 fused to a heterologous thioesterase or a heterologous PKS is fused to the 

megalomicin PKS thioesterase. Thus, for example, a thioesterase domain coding 
sequence from another PKS gene can be inserted at the end of the sixth (or other 
final) extender module coding sequence in recombinant DNA compounds of the 
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invention or the megalomicin PKS thioesterase can be similarly fused to a 
heterologous PKS. Recombinant DNA compounds encoding this thioesterase 
domain are useful in constructing DNA compounds that encode the megalomicin 
PKS, a PKS that produces a megalomicin derivative, and a PKS that produces a 
5 polyketide other than megalomicin or a megalomicin derivative. 

Thus, the hybrid modules of the invention are incorporated into a PKS to 
provide a hybrid PKS of the invention. A hybrid PKS of) the invention can result 
not only: 

(i) from fusions of heterologous domain (where heterologous means the 
1 0 domains in a module are derived from at least two different naturally occurring 

modules) coding sequences to produce a hybrid module coding sequence 
contained in a PKS gene whose product is incorporated into a PKS, 
but also: 

(ii) from fusions of heterologous modules (where heterologous module 
1 5 means two modules are adjacent to one another that are not adjacent to one 

another in naturally occurring PKS enzymes) coding sequences to produce a 
hybrid coding sequence contained in a PKS gene whose product is incorporated 
into a PKS, * 

(iii) from expression of one or more megalomicin PKS genes with one or 
20 more non-megalomicin PKS genes, including both naturally occurring and 

recombinant non-megalomicin PKS genes, and 

(iv) from combinations of the foregoing. 

Various hybrid PKSs of the invention illustrating these various alternatives are 
described herein. 

25 An example of a hybrid PKS comprising fused modules results from 

fusion of the loading module of either the DEBS PKS or the narbonolide PKS (see 
PCT patent application No. US99/1 1814, incorporated herein by reference) with 
extender modules 1 and 2 of the megalomicin PKS to produce a hybrid megAI 
gene. Co-expression of either one of these two hybrid megAI genes with the 

30 megAII and meg/4/// genes in suitable host cells, such as Streptomcyes lividans, 
results in expression of a hybrid PKS of the invention that produces 6- 
deoxyerythronolide B (the polyketide product of the natural megA genes) in 
recombinant host cells. Co-expression of either one of these two hybrid megAI 
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genes with the eryAH and eryA HI genes similarly results in the production of 6- 
dEB, while co-expression with the analogous narbonolide PKS genes, picAII, 
picAIII and picAIV, results in the production of 3-deoxy-3-oxo-6-dEB (3-keto-6- 
dEB) 3 useful in the production of ketolides, compounds with potent anti-bacterial 
5 activity. 

Another example of a hybrid PKS comprising a hybrid module is prepared 
by co-expressing the megAI and megAII genes with a megAHI hybrid gene 
encoding extender module 5 and the KS and AT of extender module 6 of the 
megalomicin PKS fused to the ACP of module 6 and the TE of the narbonolide 

10 PKS. The resulting hybrid PKS of the invention produces 3-keto-6-dEB. This 

compound can also be prepared by a recombinant megalomicin derivative PKS of 
the invention in which the KR domain of module 6 of the megalomicin PKS has 
been deleted. Moreover, the invention provides hybrid PKSs in which not only the 
above changes have been made but also the AT domain of module 6 has been 

1 5 replaced with a malonyl-specific AT. These hybrid PKSs produce 2-desmethyl-3- 
deoxy-3-oxo-6-dEB, a useful intermediate in the preparation of 2-desmethyl 
ketolides, compounds with potent antibiotic activity. 

Another illustrative example of a hybrid PKS includes the hybrid PKS of 
the invention resulting only from the latter change in the hybrid PKS just 

20 described. Thus, co-expression of the megAI and megAII genes with a hybrid 
megAIII gene in which the AT domain of module 6 has been replaced by a 
malonyl-specific AT results in the expression of a hybrid PKS that produces 2- 
desmethyl-6-dEB in recombinant host cells. This compound is a useful 
intermediate for making 2-desmethyl erythromycins in recombinant host cells of 

25 the invention, as well as for making 2-desmethyl semi -synthetic ketolides. 

While many of the hybrid PKSs described above are composed primarily 
of megalomicin PKS proteins, those of skill in the art recognize that the present 
invention provides many different hybrid PKSs, including those composed of only 
a small portion of the megalomicin PKS. For example, the present invention 

30 provides a hybrid PKS in which a hybrid eryAI gene that encodes the megalomicin 
PKS loading module fused to extender modules 1 and 2 of DEBS is coexpressed 
with the eryAIIand eryAIII genes. The resulting hybrid PKS produces 6-dEB, the 
product of the native DEBS. When the construct is expressed in 
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Saccharopolyspora erythraea host cells (either via chromosomal integration in the 
chromosome or via a vector that encodes the hybrid PKS), the resulting 
recombinant host cell of the invention produces erythromycins. Another 
illustrative example is the hybrid PKS of the invention composed of the megAI 
5 and eryAII and eryAIII gene products. This construct is also useful in expressing 
erythromycins in Saccharopolyspora erythraea host cells. In a preferred 
embodiment, the S. erythraea host cells are eryAJ mutants that do not produce 6- 
deoxyerythronolide B. 

Another example is the hybrid PKS of the invention composed of the 

10 products of the picAI and pic A II genes (the two proteins that comprise the loading 
module and extender modules 1 - 4, inclusive, of the narbonolide PKS) and the 
megA III gene. The resulting hybrid PKS produces the macrolide aglycone 3- 
hydroxy-narbonolide in Streptomyces lividans host cells and the corresponding 
erythromycins in Saccharopolyspora erythraea host cells. 

15 Each of the foregoing hybrid PKS enzymes of the invention, and the hybrid 

PKS enzymes of the invention generally, can be expressed in a host cell that also 
expresses a functional oleP gene product. The oleP gene encodes an oleandomycin 
modi fication enzyme, and expression of the gene together with a hybrid PKS of 
the invention provides the compounds of the invention in which a C-8 hydroxyl, a 

20 C-8a or C-8-C-8a epoxide is present. 

Recombinant methods for manipulating modular PKS genes to make 
hybrid PKS enzymes are described in U.S. Patent Nos. 5,672,491 ; 5,843,718; 
5,830,750: and 5,712,146; and in PCT publication Nos. 98/49315 and 97/02358, 
each of which is incorporated herein by reference. A number of genetic 

25 engineering strategies have been used with DEBS to demonstrate that the 

structures of polyketides can be manipulated to produce novel natural products, 
primarily analogs of the erythromycins (see the patent publications referenced 
supra and Hutchinson, 1998, Curr Opin Microbiol. 7:319-329, and Baltz, 1998, 
Trends Microbiol 5:76-83, incorporated herein by reference). Because of the 

30 similar activity of the megalomicin PKS and DEBS (both PKS enzymes produce 
the macrolide aglycone 6-dEB), these methods can be readily applied to the 
recombinant megalomicin PKS genes of the invention. 
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These techniques include: (i) deletion or insertion of modules to control 
chain length, (ii) inactivation of reduction/dehydration domains to bypass beta- 
carbon processing steps, (iii) substitution of AT domains to alter starter and 
extender units, (iv) addition of reduction/dehydration domains to introduce 
5 catalytic activities, and (v) substitution of ketoreductase KR domains to control 
hydroxy! stereochemistry. In addition, engineered blocked mutants of DEBS have 
been used for precursor directed biosynthesis of analogs that incorporate 
synthetically derived starter units. For example, more than 100 novel polyketides 
were produced by engineering single and combinatorial changes in multiple 

10 modules of DEBS. Hybrid PKS enzymes based on DEBS with up to three catalytic 
domain substitutions were constructed by cassette mutagenesis, in which various 
DEBS domains were replaced with domains from the rapamycin PKS (see 
Schweke et a/., 1995, Proc Nat. Acad Scl USA 92, 7839-7843, incorporated 
herein by reference) or one more of the DEBS KR domains was deleted. 

1 5 Functional single domain replacements or deletions were combined to generate 
DEBS enzymes with double and triple catalytic domain substitutions (see 
McDaniel et ah, 1999, Proc. Nat. Acad Sci. USA 96, 1846-1851, incorporated 
herein by reference). By providing the analogous megalomicin/rapamycin hybrid 
PKS enzymes, the present invention provides alternative means to make these 

20 polyketides. 

Methods for generating libraries of polyketides have been greatly improved 
by cloning PKS genes as a set of three or more mutually selectable plasmids, each 
carrying a different wild-type or mutant PKS gene, then introducing all possible 
combinations of the plasmids with wild-type, mutant, and hybrid PKS coding 

25 sequences into the same host (see U.S. patent application Serial No. 60/1 29,73 1 , 
filed 16 Apr. 1999, and PCT Pub. No. 98/27203, each of which is incorporated 
herein by reference). This method can also incorporate the use of a KS1° mutant, 
which by mutational biosynthesis can produce polyketides made from diketide 
starter units (see Jacobsen et al., 1997, Science 277 , 367-369, incorporated herein 

30 by reference), as well as the use of a truncated gene that leads to 12-membered 
macrolides or an elongated gene that leads to 16-membered ketolides. Moreover, 
by utilizing in addition one or more vectors that encode glycosyl biosynthesis and 
transfer genes, such as those of the present invention for megosamine, 
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desosamine, oleandrose, cladinose, and/or mycarose (in any combination), a large 
collection of glycosylated polyketides can be prepared. 

The following Table lists references describing illustrative PKS genes and 
corresponding enzymes that can be utilized in the construction of the recombinant 
5 hybrid PKSs and the corresponding DNA compounds that encode them of the 

invention. Also presented are various references describing tailoring enzymes and 
corresponding genes that can be employed in accordance with the methods of the 
invention. 
Avermectin 
10 U.S. Pat. No. 5,252,474 to Merck. 

MacNeil et ah, 1993, Industrial Microorganisms: Basic and Applied 
Molecular Genetics , Baltz, Hegeman, & Skatrud, eds. (ASM), pp. 245-256, A 
Comparison of the Genes Encoding the Polyketide Syndiases for Avermectin, 
Erythromycin, and Nemadectin. 
15 MacNeil et al., 1992, Gene 115: \ 19-125, Complex Organization of the 

Streptomyces avermitilis genes encoding the avermectin polyketide synthase. 
Candicidin (FR008) 

Hue/ a/., 1994, Mol Microbiol. 14: 163-172. 
Epothilone 

20 PCT Pub. No. 00/03 1 247 to Kosan. 

Erythromycin 

PCT Pub. No. 93/1 3663 to Abbott. 
US Pat. No. 5,824,513 to Abbott. 
Donadio et al. 9 1991, Science 252:675-9. 
25 Cortes et al., 8 Nov. 1990, Nature 348:176-8, An unusually large 

multifunctional polypeptide in the erythromycin producing polyketide synthase of 
Saccharopolyspora erythraea. 
Glvcosvlation Enzymes 
PCT Pub. No. 97/23630 to Abbott. 
30 FK-506 

Motamedi et aL, 1998, The biosynthetic gene cluster for the macrolactone 
ring of the immunosuppressant FK506, Eur. J. biochem. 256: 528-534. 
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Motamedi ei at., 1997, Structural organization of a multifunctional 
polyketide synthase involved in the biosynthesis of the macrolide 
immunosuppressant FK506, Eur. J. Biochem. 244: 74-80. 

Methyl transferase 

5 US 5,264355, issued 23 Nov. 1993, Methylating enzyme from 

Streptomyces MA6858. 3 l-O-desmethyl-FK506 methyltransferase. 

Motamedi ei al., 1996, Characterization of methyltransferase and 
hydroxylase genes involved in the biosynthesis of the immunosuppressants FK506 
andFK520,J. Bacterial. 1 78: 5243-5248. 
10 FK-520 

PCT Pub. No. 00/20601 to Kosan. 

See also Nielsen ei al., 1991, Biochem. 50:5789-96 (enzymology of 
pipecolate incorporation). 
Lovastatin 

1 5 U.S. Pat. No. 5,744,350 to Merck. 

Narbomycin (and Picromycin) 

PCT Pub. No. WO US99/61 599 to Kosan. 
Ncmadectin 

MacNeil ct a/., 1993, supra, 
20 Niddamycin 

Kakavas ei aL, 1997, Identification and characterization of the niddamycin 
polyketide synthase genes from Streptomyces caelestis.J. Bacterioi 179: 7515- 
7522. 

Oleandomycin 

25 Swan ct al., 1994, Characterization of a Streptomyces antibioticus gene 

. encoding a type 1 polyketide synthase which has an unusual coding sequence, Mol. 
Gen. Genet. 242: 358-362. 

PCT Pub. No. 00/026349 to Kosan. 

Olano et aL, 1 998, Analysis of a Streptomyces antibioticus chromosomal 
30 region involved in oleandomycin biosynthesis, which encodes two 

glycosyl transferases responsible for glycosylation of the macrolactone ring, Mol. 

Gen. Genet. 259(3): 299-308. 

Platenolide 
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EP Pub. No. 791,656 to Lilly. 
Rapamycin 

Schwecke et aL, Aug. 1 995, The biosynthetic gene cluster for the 
polyketide rapamycin, Proa Natl. Acad. ScL USA 92:7839-7843. 
5 Aparicio et aL, 1996, Organization of the biosynthetic gene cluster for 

rapamycin in Streptomyces hygroscopicus: analysis of the enzymatic domains in 
the modular polyketide synthase, Gene 169: 9-16. 
Rifamycin 

August et aL, 13 Feb. 1998, Biosynthesis of the ansamycin antibiotic 
10 rifamycin: deductions from the molecular analysis of the r/ybiosynthetic gene 
cluster of Amycolatopsis mediterranei S669, Chemistry & Biology, 5(2): 69-79. 
.Soraphen 

U.S. Pat. No. 5,716,849 to Novartis. 

Schupp et aL, 1995, J. Bacteriology 177: 3673-3679. A Sorangium 
1 5 cellulosum (Myxobacterium) Gene Cluster for the Biosynthesis of the Macrolide 
Antibiotic Soraphen A: Cloning, Characterization, and Homology to Polyketide 
Synthase Genes from Actinomycetes. 
Spiramycin 

U.S. Pat. No. 5,098,837 to Lilly. 
20 Activator Gene 

U.S. Pat. No. 5,5 14,544 to Lilly. 
Tylosin 

EP Pub. No. 791,655 to Lilly. 

Kuhstoss et al., 1996, Gene 1 83:231-6., Production of a novel polyketide 

25 through the construction of a hybrid polyketide synthase. 

U.S. Pat. No. 5,876,991 to Lilly. 
Tailoring enzymes 

Merson-Davies and Cundliffe, 1994, Mol. Microbiol 13: 349-355. 
Analysis of five tylosin biosynthetic genes from the tylBA region of the 
30 Streptomyces fradiae genome. 

As the above Table illustrates, there are a wide variety of PKS genes that serve as 
readily available sources of DNA and sequence information for use in constructing 
the hybrid PKS-encoding DNA compounds of the invention. 
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In constructing hybrid PKSs of the invention, certain general methods may 
be helpful. For example, it is often beneficial to retain the framework of the 
module to be altered to make the hybrid PKS. Thus, if one desires to add DH and 
ER functionalities to a module, it is often preferred to replace the KR domain of 
5 the original module with a cognate KR, DH, and ER domain-containing segment 
from another module, instead of merely inserting DH and ER domains. One can 
alter the stereochemical specificity of a module by replacement of the KS domain 
with a KS domain from a module that specifies a different stereochemistry. See 
Lau et al., 1999, "Dissecting the role of acyltransferase domains of modular 

10 polyketide synthases in the choice and stereochemical fate of extender units" 

Biochemistry 38(5): 1643- 1651, incorporated herein by reference. One can alter the 
specificity of an AT domain by changing only a small segment of the domain. See 
Lau et aL y supra. One can also take advantage of known linker regions in PKS 
proteins to link modules from two different PKSs to create a hybrid PKS. See 

15 Gokhale et al. 9 1 6 Apr. 1 999, Dissecting and Exploiting Intermodular 

Communication in Polyketide Synthases", Science 284: 482-485, incorporated 
herein by reference. 

The hybrid PKS-encoding DNA compounds of the invention can be and 
often are hybrids of more than two PKS genes. Even where only two genes are 

20 used, there are often two or more modules in the hybrid gene in which all or part 
of the module is derived from a second (or third) PKS gene. Thus, as one 
illustrative example, the invention provides a hybrid PKS that contains the 
naturally occurring loading module and thioesterase domain as well as extender 
modules one, two, four, and six of the megalomicin PKS and further contains 

25 hybrid or heterologous extender modules three and five. Hybrid or heterologous 
extender modules three and five contain AT domains specific for malonyl CoA 
and derived from, for example, the rapamycin PKS genes. 

The invention also provides libraries of PKS genes, PKS proteins, and 
ultimately, of polyketides, that are constructed by generating modifications in the 

30 megalomicin PKS so that the protein complexes produced have altered activities 
in one or more respects and thus produce polyketides other than the natural 
product of the PKS. Novel polyketides may thus be prepared, or polyketides in 
general prepared more readily, using this method. By providing a large number of 
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different genes or gene clusters derived from a naturally occurring PKS gene 
cluster, each of which has been modified in a different way from the native cluster, 
an effectively combinatorial library of polyketides can be produced as a result of 
the multiple variations in these activities. As will be further described below, the 
5 metes and bounds of this embodiment of the invention can be described on the 
polyketide, protein, and the encoding nucleotide sequence levels. 

As described above, a modular PKS "derived from" the megalomicin or 
other naturally occurring PKS includes a modular PKS (or its corresponding 
encoding gene(s)) that retains the scaffolding of the utilized portion of the 

1 0 naturally occurring gene. Not all modules need be included in the constructs; 

however, the constructs can also comprise more than six modules. On the constant 
scaffold, at least one enzymatic activity is mutated, deleted, replaced, or inserted 
so as to alter the activity of the resulting PKS relative to the original (native) PKS. 
Alteration results when these activities are deleted or are replaced by a different 

1 5 version of the activity, or simply mutated in such a way that a polyketide other 
than the natural product results from these collective activities. This occurs 
because there has been a resulting alteration of the starter unit and/or extender 
unit, stereochemistry, chain length or cyclization, and/or reductive or dehydration 
cycle outcome at a corresponding position in the product polyketide. Where a 

20 deleted activity is replaced, the origin of the replacement activity may come from a 
corresponding activity in a different naturally occurring PKS or from a different 
region of the megalomicin PKS. Any or all of the megalomicin PKS genes may be 
included in the derivative or portions of any of these may be included, but the 
scaffolding of a functional PKS protein is retained in whatever derivative is 

25 constructed. The derivative preferably contains a thioesterase activity from the 
megalomicin or another PKS. 

Thus, a PKS derived from the megalomicin PKS includes a PKS that 
contains the scaffolding of all or a portion of the megalomicin PKS. The derived 
PKS also contains at least two extender modules that are functional, preferably 

30 three extender modules, and more preferably four or more extender modules, and 
most preferably six extender modules. The derived PKS also contains mutations, 
deletions, insertions, or replacements of one or more of the activities of the 
functional modules of the megalomicin PKS so that the nature of the resulting 
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polyketide is altered at both the protein and DNA sequence levels. Particular 
preferred embodiments include those wherein a KS, AT, or ACP domain has been 
deleted or replaced by a version of the activity from a different PKS or from 
another location within the same PKS. Also preferred are derivatives where at 
5 least one non-condensation cycle enzymatic activity (KR, DH, or ER) has been 
deleted or added or wherein any of these activities has been mutated so as to 
change the structure of the polyketide synthesized by the PKS. 

Conversely, also included within the definition of a PKS derived from the 
megalomicin PKS are functional non-megalomicin PKS modules or their 

10 encoding genes wherein at least one domain or coding sequence therefor of a 
megalomicin PKS module has been inserted. Exemplary is the use of the 
megalomicin AT for extender module 2, which accepts a methylmalonyl CoA 
extender unit rather than malonyl CoA, to replace a malonyl specific AT in 
another PKS. Other examples include insertion of portions of non-condensation 

15 cycle enzymatic activities or other regions of megalomicin synthase activity into a 
heterologous PKS at both the DNA and protein levels. 

Thus, there are at least five degrees of freedom for constructing a hybrid 
PKS in terms of the polyketide that will be produced. First, the polyketide chain 
length is determined by the number of extender modules in the PKS, and the 

20 present invention includes hybrid PKSs that contain 6, as wells as fewer or more 
than 6, extender modules. Second, the nature of the carbon skeleton of the PKS is 
determined by the specificities of the acyl transferases that determine the nature of 
the extender units at each position, e.g., malonyl, methylmalonyl, ethylmalonyl, or 
other substituted malonyl. Third, the loading module specificity also has an effect 

25 on the resulting carbon skeleton of the polyketide. The loading module may use a 
different starter unit, such as acetyl, butyryl, and the like. As noted above, another 
method for varying loading module specificity involves inactivating the KS 
activity in extender module 1 (KS1) and providing alternative substrates, called 
diketides, that are chemically synthesized analogs of extender module 1 diketide 

30 products, for extender module 2. This approach was illustrated in PCT publication 
Nos. 97/02358 and 99/03986, incorporated herein by reference, wherein the KS1 
activity was inactivated through mutation. Fourth, the oxidation state at various 
positions of the polyketide will be determined by the dehydratase and reductase 
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portions of the modules. This will determine the presence and location of ketone 
and alcohol moieties and C-C double bonds or C-C single bonds in the polyketide. 

Finally, the stereochemistry of the resulting polyketide is a function of 
three aspects of the synthase. The first aspect is related to the AT/KS specificity 
5 associated with substituted malonyls as extender units, which affects 

stereochemistry only when the reductive cycle is missing or when it contains only 
a ketoreductase, as the dehydratase would abolish chirality. Second, the specificity 
of the ketoreductase may determine the chirality of any beta-OH. Finally, the 
enoy I reductase specificity for substituted malonyls as extender units may influence 

10 the stereochemistry when there is a complete KR/DH/ER available. 

Thus, the modular PKS systems generally and the megalomicin PKS 
system particularly permit a wide range of polyketides to be synthesized. As 
compared to the aromatic PKS systems, the modular PKS systems accept a wider 
range of starter units, including aliphatic monomers (acetyl, propionyl, butyryl, 

15 isovaleryl, and the like.), aromatics (aminohydroxybenzoyl), alicyclics 

(cyclohexanoyl), and heterocyclics (thiazolyl). Certain modular PKSs have relaxed 
specificity for their starter units (Kao et ai, 1994, Science, supra). Modular PKSs 
also exhibit considerable variety with regard to the choice of extender units in 
each condensation cycle. The degree of beta-ketoreduction following a 

20 condensation reaction can be altered by genetic manipulation (Donadio et al, 

1 99 1 , Science, supra; Donadio et aL, 1993, Proc. Natl. Acad Sci. USA 90: 7119- 
7123). Likewise, the size of the polyketide product can be varied by designing 
mutants with the appropriate number of modules (Kao et ai y 1994, J. Am. Chem. 
Soc. I J6: \ 1612-1 1613). Lastly, modular PKS enzymes are particularly well 

25 known for generating an impressive range of asymmetric centers in their products 
in a highly controlled manner. The polyketides, antibiotics, and other compounds 
produced by the methods of the invention are typically single stereoisomeric 
forms. Although the compounds of the invention can occur as mixtures of 
stereoisomers, it may be beneficial in some instances to generate individual 

30 stereoisomers. Thus, the combinatorial potential within modular PKS pathways 
based on any naturally occurring modular, such as the megalomicin, PKS scaffold 
is virtually unlimited. 
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While hybrid PKSs are most often produced by "mixing and matching" 
portions of PKS coding sequences, mutations in DNA encoding a PKS can also be 
used to introduce, alter, or delete an activity in the encoded polypeptide. Mutations 
can be made to the native sequences using conventional techniques. The substrates 
5 for mutation can be an entire cluster of genes or only one or two of them; the 
substrate for mutation may also be portions of one or more of these genes. 
Techniques for mutation include preparing synthetic oligonucleotides including 
the mutations and inserting the mutated sequence into the gene encoding a PKS 
subunit using restriction endonuclease digestion. See, e.g., Kunkel, 1985, Proc. 

10 Natl. Acad. Sci USA 82: 448; Geisselsoder et al, 1987, Biotechniques 5:786. 
Alternatively, the mutations can be effected using a mismatched primer (generally 
10-20 nucleotides in length) that hybridizes to the native nucleotide sequence, at a 
temperature below the melting temperature of the mismatched duplex. The primer 
can be made specific by keeping primer length and base composition within 

15 relatively narrow limits and by keeping the mutant base centrally located. See 

Zoller and Smith, 1983, Methods EnzymoL 700:468. Primer extension is effected 
using DNA polymerase, the product cloned, and clones containing the mutated 
DNA, derived by segregation of the primer extended strand, selected. 
Identification can be accomplished using the mutant primer as a hybridization 

20 probe. The technique is also applicable for generating multiple point mutations. 
See, e.g., Dalbie-McFarland et at., 1982, Proc. Natl. Acad. Sci. USA 79: 6409. 
PCR mutagenesis can also be used to effect the desired mutations. 

Random mutagenesis of selected portions of the nucleotide sequences 
encoding enzymatic activities can also be accomplished by several different 

25 techniques known in the art, e.g., by inserting an oligonucleotide linker randomly 
into a plasmid, by irradiation with X-rays or ultraviolet light, by incorporating 
incorrect nucleotides during in vitro DNA synthesis, by error-prone PCR 
mutagenesis, by preparing synthetic mutants, or by damaging plasmid DNA in 
vitro with chemicals, in accordance with the methods of the present invention. 

30 Chemical mutagens include, for example, sodium bisulfite, nitrous acid, 

nitrosoguanidine, hydroxylamine, agents which damage or remove bases thereby 
preventing normal base-pairing such as hydrazine or formic acid, analogues of 
nucleotide precursors such as 5-bromouracil, 2-aminopurine, or acridine 
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intercalating agents such as proflavine, acriflavine, quinacrine, and the like. 
Generally, plasmid DNA or DNA fragments are treated with chemical mutagens, 
transformed into E. coli and propagated as a pool or library of mutant plasmids. 

In constructing a hybrid PKS of the invention, regions encoding enzymatic 
5 activity, i.e., regions encoding corresponding activities from different PKS 
synthases or from different locations in the same PKS, can be recovered, for 
example, using PCR techniques with appropriate primers. By "corresponding" 
activity encoding regions is meant those regions encoding the same general type of 
activity. For example, a KR activity encoded at one location of a gene cluster 
10 "corresponds" to a KR encoding activity in another location in the gene cluster or 
in a different gene cluster. Similarly, a complete reductase cycle could be 
considered corresponding. For example, KR/DH/ER can correspond to a KR 
alone. 

If replacement of a particular target region in a host PKS is to be made, 

15 this replacement can be conducted in vitro using suitable restriction enzymes. The 
replacement can also be effected in vivo using recombinant techniques involving 
homologous sequences framing the replacement gene in a donor plasmid and a 
receptor region in a recipient plasmid. Such systems, advantageously involving 
plasmids of differing temperature sensitivities are described, for example, in PCX 

20 publication No. WO 96/40968, incorporated herein by reference. The vectors used 
to perform the various operations to replace the enzymatic activity in the host PKS 
genes or to support mutations in these regions of the host PKS genes can be 
chosen to contain control sequences operably linked to the resulting coding 
sequences in a manner such that expression of the coding sequences can be 

25 effected in an appropriate host. 

However, simple cloning vectors may be used as well. If the cloning 
vectors employed to obtain PKS genes encoding derived PKS lack control 
sequences for expression operably linked to the encoding nucleotide sequences, 
the nucleotide sequences are inserted into appropriate expression vectors. This 

30 need not be done individually, but a pool of isolated encoding nucleotide 
sequences can be inserted into expression vectors, the resulting vectors 
transformed or transfected into host cells, and the resulting cells plated out into 
individual colonies. The invention provides a variety of recombinant DNA 
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compounds in which the various coding sequences for the domains and modules 
of the megalomicin PKS are flanked by non-naturally occurring restriction enzyme 
recognition sites. 

The various PKS nucleotide sequences can be cloned into one or more 
5 recombinant vectors as individual cassettes, with separate control elements, or 
under the control of, e.g., a single promoter. The PKS subunit encoding regions 
can include flanking restriction sites to allow for the easy deletion and insertion of 
other PKS subunit encoding sequences so that hybrid PKSs can be generated. The 
design of such unique restriction sites is known to those of skill in the art and can 

10 be accomplished using the techniques described above, such as site-directed 
mutagenesis and PCR. 

The expression vectors containing nucleotide sequences encoding a variety 
of PKS enzymes for the production of different polyketides are then transformed 
into the appropriate host cells to construct the library. In one straightforward 

1 5 approach, a mixture of such vectors is transformed into the selected host cells and 
the resulting cells plated into individual colonies and selected to identify 
successful transformants. Each individual colony has the ability to produce a 
particular PKS synthase and ultimately a particular polyketide. Typically, there 
will be duplications in some, most, or all of the colonies; the subset of the 

20 transformed colonies that contains a different PKS in each member colony can be 
considered the library. Alternatively, the expression vectors can be used 
individually to transform hosts, which transformed hosts are then assembled into a 
library. A variety of strategies are available to obtain a multiplicity of colonies 
each containing a PKS gene cluster derived from the naturally occurring host gene 

25 cluster so that each colony in the library produces a different PKS and ultimately a 
different polyketide. The number of different polyketides that are produced by the 
library is typically at least four, more typically at least ten, and preferably at least 
20, and more preferably at least 50, reflecting similar numbers of different altered 
PKS gene clusters and PKS gene products. The number of members in the library 

30 is arbitrarily chosen; however, the degrees of freedom outlined above with respect 
to the variation of starter, extender units, stereochemistry, oxidation state, and 
chain length enables the production of quite large libraries. 
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Methods for introducing the recombinant vectors of the invention into 
suitable hosts are known to those of skill in the art and typically include the use of 
CaCh or agents such as other divalent cations, lipofection, DMSO, protoplast 
transformation, conjugation, infection, transfection, and electroporation. The 
5 polyketide producing colonies can be identified and isolated using known 

techniques and the produced polyketides further characterized. The polyketides 
produced by these colonies can be used collectively in a panel to represent a 
library or may be assessed individually for activity. 

The libraries of the invention can thus be considered at four levels: (1) a 

1 0 multiplicity of colonies each with a different PKS encoding sequence; (2) the 

proteins produced from the coding sequences; (3) the polyketides produced from 
the proteins assembled into a functional PKS; and (4) antibiotics or compounds 
with other desired activities derived from the polyketides. Of course, combination 
libraries can also be constructed wherein members of a library derived, for 

1 5 example, from the megalomicin PKS can be considered as a part of the same 
library as those derived from, for example, the rapamycin PKS or DEBS. 

Colonies in the library are induced to produce the relevant synthases and 
thus to produce the relevant polyketides to obtain a library of polyketides. The 
polyketides secreted into the media can be screened for binding to desired targets, 

20 such as receptors, signaling proteins, and the like. The supernatants per se can be 
used for screening, or partial or complete purification of the polyketides can first 
be effected. Typically, such screening methods involve detecting the binding of 
each member of the library to receptor or other target ligand. Binding can be 
detected either directly or through a competition assay. Means to screen such 

25 libraries for binding are well known in the art and can be applied in accordance 
with the methods of the present invention. Alternatively, individual polyketide 
members of the library can be tested against a desired target. In this event, screens 
wherein the biological response of the target is measured can more readily be 
included. Antibiotic activity can be verified using typical screening assays such as 

30 those set forth in Lehrer et cil, 1991, J. Immunol. Me(h. 13 7: 167- 173, incorporated 
herein by reference, and in the Examples below. 

The invention provides methods for the preparation of a large number of 
polyketides. These polyketides are useful intermediates in formation of 
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compounds with antibiotic or other activity through hydroxylation, epoxidation, 
and glycosylation reactions as described above. In general, the polyketide products 
of the PKS must be further modified, typically by hydroxylation and glycosylation, 
to exhibit potent antibiotic activity. Hydroxylation results in the novel polyketides 
5 of the invention that contain hydroxyl groups at C-6, which can be accomplished 
using the hydroxylase encoded by the eryF gene, and/or C-12, which can be 
accomplished using the hydroxylase encoded by the picK or eryK gene. Also, the 
oleP gene is available in recombinant form, which can be used to express the oleP 
gene product in any host cell. A host cell, such as a Streptomyces host cell or a 

10 Saccharopolyspora eryihraea host cell, modified to express the oleP gene thus can 
be used to produce polyketides comprising the C-8-C-8a epoxide present in 
oleandomycin. Thus the invention provides such modified polyketides. The 
presence of hydroxyl groups at these positions can enhance the antibiotic activity 
of the resulting compound relative to its unhydroxylated counterpart. 

1 5 Methods for glycosylating polyketides are generally known in the art and 

can be applied in accordance with the methods of the present invention; the 
glycosylation may be effected intracellular^ by providing the appropriate 
glycosylation enzymes or may be effected in vitro using chemical synthetic means 
as described herein and in PCT publication No. WO 98/49315, incorporated 
.20 herein by reference. Preferably, glycosylation with desosamine, mycarose, and/or 
megosamine is effected in accordance with the methods of the invention in 
recombinant host cells provided by the invention. In general, the approaches to 
effecting glycosylation mirror those described above with respect to 
hydroxylation. The purified enzymes, isolated from native sources or 

25 recombinantly produced may be used in vitro. Alternatively and as noted, 

glycosylation may be effected intracellular^ using endogenous or recombinantly 
produced intracellular glycosylases. In addition, synthetic chemical methods may 
be employed. 

The antibiotic modular polyketides may contain any of a number of 
30 different sugars, although D-desosamine, or a close analog thereof, is most 

common. Erythromycin, picromycin, megalomicin, narbomycin, and methymycin 
contain desosamine. Erythromycin also contains L-cladinose (3-O-methyl 
mycarose). Tylosin contains mycaminose (4-hydroxy desosamine), mycarose and 
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6-deoxy-D-allose. 2-acetyI-l-bromodesosamine has been used as a donor to 
glycosylate polyketides by Masamune et al. 9 1975, J. Am. Chem. Soc. 97: 3512- 
351 3. Other, apparently more stable donors include glycosyl fluorides, 
thioglycosides, and trichloroacetimidates; see Woodward et aL, 1981, J, Am. 
5 Chem. Soc. 103: 3215; Martin et aL, 1997, J. Am. Chem, Soc. 119: 3193; Toshima 
etaL, 1995, J. Am. Chem. Soc. 7/7: 3717; Matsumoto et aL, 1988, Tetrahedron 
Lett. 29: 3575. Glycosylation can also be effected using the polyketide aglycones 
as starting materials and using Saccharopolyspora erythraea or Streptomyces 
venezuelae or other host cell to make the conversion, preferably using mutants 
10 unable to synthesize macrolides, as discussed in the preceding Section. 

Thus, a wide variety of polyketides can be produced by the hybrid PKS 
enzymes of the invention. These polyketides are useful as antibiotics and as 
intermediates in the synthesis of other useful compounds, as described in the 
following section. 

15 

Section VII: Host Cells Containing Multiple Expression Vectors 

A recombinant host cell of the invention may contain nucleic acid 
encoding a megalomicin PKS domain, module, or protein, or megalomicin 
modification enzyme at a single genetic locus, e.g., on a single plasmid or at a 

20 single chromosomal locus, or at different genetic loci, e.g., on separate plasmids 
and/or chromosomal loci. By "multiple" is meant two or more; by "vector" is 
meant a nucleic acid molecule which can be used to transform host systems and 
which contains an independent expression system containing a coding sequence 
under control of a promoter and optionally a selectable marker and any other 

25 suitable sequences regulating expression. Typical such vectors are plasmids, but 
other vectors such as phagemids, cosmids, viral vectors and the like can be used 
according to the nature of the host. Of course, one or more of the separate vectors 
may integrate into the chromosome of the host (selection may not be required for 
maintenance of integrated vectors). 

30 In one embodiment, the invention provides a recombinant host cell, which 

comprises at least two separate autonomously replicating recombinant DNA 
expression vectors, each of said vectors comprises a recombinant DNA compound 
encoding a megalomicin PKS domain or a megalomicin modification enzyme 
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operably linked to a promoter. In another embodiment, the invention provides a 
recombinant host cell, which comprises at least one autonomously replicating 
recombinant DNA expression vector and at least one modified chromosome^ each 
of said vector(s) and each of said modified chromosome comprises a recombinant 
5 DNA compound encoding a megalomicin PKS domain or a megalomicin 

modification enzyme operably linked to a promoter. Preferably, the autonomously 
replicating recombinant DNA expression vector and/or the modified chromosome 
further comprises distinct selectable markers. 

The above multiple-vector (chromosome) expression systems can also be 

10 used for expressing heterogeneous polyketide biosynthetic enzymes, e.g., for 

expressing Micromonospora mcgalomicea megalomicin PKS protein, module, or 
domain or a megalomicin modification enzyme with a PKS protein, module, or 
domain, or modification enzyme from other origins in the same host cells. By 
placing various activities on different expression vectors, a high degree of 

15 variation can be achieved in an efficient manner. A variety of hosts can be used; 
any suitable host cell that can maintain multiple vectors can readily be used. 
Preferred hosts include Streptomyces, yeast, E. coli, other actinomycetes, and plant 
cells ; and mammalian or insect cells or other suitable recombinant hosts can also 
be used. Preferred among yeast strains are Saccharomyces cerevisiae and Pichia 

20 pastor is. Preferred actinomycetes include various strains of Streptomyces, 

If one chooses to use a host cell that does not naturally produce a 
polyketide, then one may need to ensure that the recombinant host is modified to 
also contain a holo ACP synthase activity that effects pantetheinylation of the acyl 
carrier protein. See PCT Pub. No. WO 97/13845, incorporated herein by 

25 reference. One of the multiple vectors may be used for this purpose. This 

activation step is necessary for activation of the ACP. The expression system for 
the holo ACP synthase may be supplied on a vector separate from that carrying a 
PKS coding sequence or may be supplied on the same vector or may be integrated 
into the chromosome of the host, or may be supplied as an expression system for a 

30 fusion protein with all or a portion of a polyketide synthase (see U.S. Patent No. 
6,033,883, incorporated herein by reference). 

It should be noted that in some recombinant hosts, it may also be necessary 
to activate the polyketides produced through postsynthesis modifications when 

76 

DOCID: <WO 0127284A2 I > 



WO 01/27284 



PCT/US00/27433 



polyketides having such modifications are desired. If this is the case for a 
particular host, the host will be modified, for example by transformation, to 
contain those enzymes necessary for effecting these modifications. Among such 
enzymes, for example, are glycosylation enzymes. The use of multiple vectors can 
5 facilitate the introduction of expression systems for such enzymes. 

In a preferred embodiment, the multiple vector system is used to assemble 
rapidly and efficiently a combinatorial library of polyketides and the 
PKS/modification enzymes that produce them. In an illustrative embodiment, the 
multiple vector system comprises four different vectors, one comprising the megAI 

I 0 gene, one the megAII gene, one the megAI II gene, and one the modification 

enzyme(s) gene(s). Each of these vectors can be modified to make a set of vectors. 
For example, one set could contain all possible AT substitutions in the loading and 
first and second extender modules of the megAI gene product. Another set could 
contain expression systems for a variety of different modification enzymes. With 

1 5 these four vectors sets and by combining each member of each set with each 
member of the other three sets, a very large library of cells, vector sets, and 
polyketides can be rapidly and efficiently assembled. 

The combinatorial potential of a modular PKS such as the megalomicin 
PKS (ignoring the additional potential of different modification enzyme systems) 

20 is minimally given by: AT L X (ATe X 4) M where AT L is the number of loading 
acyl transferases, ATe is the number of extender acyl transferases, and M is the 
number of modules in the gene cluster. The number 4 is present in the formula 
because this represents the number of ways a keto group can be modified by either 
1) no reaction; 2) KR activity alone; 3) KR+DH activity; or 4) KR+DH+ER 

25 activity. It has been shown that expression of only the first two modules of the 
erythromycin PKS resulted in the production of a predicted truncated triketide 
product (See Kaoetal.,7. Am. Chem. Soc. 116:11612-1 1613 ((1994)). A novel 
12-membered macrolide similar to methymycin aglycone was produced by 
expression of modules 1-5 of this PKS in S. coellcolor (See Kao et al., J. Am. 

30 Chem. Soc. , U7:9 1 05-9 1 06 ( 1 995)). This work shows that PKS modules are 

functionally independent so that lactone ring size can be controlled by the number 
of modules present. 
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In addition to controlling the number of modules, the modules can be 
genetically modified, for example, by the deletion of a ketoreductase domain as 
described by Donadio et al., Science, 252:675-679 (1991); and Donadio et al., 
Gene> 1 15:97-103 (1992). In addition, the mutation of an enoyl reductase domain 
5 was reported by Donadio, et aL, Proc. Natl Acad. Sci., 90:71 19-7123 (1993). 

These modifications also resulted in modified PKS and thus modified polyketides. 

As stated above, in the present invention, the coding sequences for 
catalytic activities derived from the megalomicin PKS systems found in nature can 
be used in their native forms or modified by standard mutagenesis techniques to 
1 0 delete or diminish activity or to introduce an activity into a module in which it was 
not originally present. For example, a KR activity can be introduced into a 
module normally lacking that function. 

In one embodiment of the invention herein, a single host cell is modified to 
contain a multiplicity of vectors, each vector contributing a portion of the 
15 synthesis of a megalomicin PKS and modification enzyme (if any) system. Each 
of the multiple vectors for production of the megalomicin PKS system typically 
encodes at least two modules, and at least one of the vectors integrates into the 
chromosome of the host. Integration can be effected using suitable phage or 
integrating vectors or by homologous recombination. If homologous 
20 recombination is used, the integration event may also be designed to delete 
endogenous PKS genes residing in the chromosome, as described in the PCT 
application WO 95/08548. In these embodiments, too, a selectable marker such as 
hygromycin or thiostrepton resistance can be included in the vector that effects 
integration. 

25 As mentioned above, additional enzymes that effect post-translational 

modifications to the enzyme systems in the megalomicin PKS may be introduced 
into the host through suitable recombinant expression systems. In addition, 
enzymes that activate the polyketides themselves, for example, through 
glycosylation may be added. It may also be desirable to modify the cell to produce 

30 more of a particular substrate utilized in polyketide biosynthesis. For example, it 
is generally believed that malonyl CoA levels in yeast are higher than 
methylmalonyl CoA; if yeast is chosen as a host, it may be desirable to increase 
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methylmalonyl CoA levels by the addition of one or more biosynthetic enzymes 
therefor. 

The multiple-vector expression system can also be used to make 
polyketides produced by the addition of synthetic starter units to a PKS that 
5 contains an inactivated ketosynthase (KS) in the first module. As noted above, 
this modification permits the system to incorporate a suitable diketide thioester 
such as 3-hydroxy-2-methyl pantonoic acid-N-acetyl cysteamine thioester, or 
similar thioesters of diketide analogs, as described by Jacobsen et al., Science, 
277 :367-369 (1997). The construction of PKS modules containing inactivated 

10 ketosynthase regions can be conducted by methods known in the art, such as the 
method described in U.S. Patent No. 6,080,555 and PCT publication Nos. WO 
99/03986 and 97/02358, each of which is incorporated herein by reference, in 
accordance with the methods of the present invention. 

The multiple-vector expression system can be used to produce polyketides 

1 5 in hosts that normally do not produce them, such as E. coli and yeast. It also 
provides more efficient means to provide a variety of polyketide products by 
supplying the elements of the introduced PKS, whether in an E. coli or yeast host 
or in other more traditionally used hosts, such as Streptomyces. The invention 
also includes libraries of polyketides prepared using the methods of the invention. 

20 

Section Vlll: Compounds 

The methods and recombinant DNA compounds of the invention are useful 
in the production of polyketides. In one important aspect, the invention provides 
methods for making antibiotic compounds related in structure to erythromycin, a 

25 potent antibiotic compound. The invention also provides novel ketolide 

compounds, polyketide compounds with potent antibiotic activity of significant 
interest due to activity against antibiotic resistant strains of bacteria. See 
Griesgraber et aL, 1996, AntibioL 49: 465-477, incorporated herein by 
reference. Most if not all of the ketolides prepared to date are synthesized using 

30 erythromycin A. a derivative of 6-dEB, as an intermediate. In one embodiment, 
the present invention provides the 3-keto derivatives of the megalomicins for use 
as antibiotics. In particular, the 3-keto derivative of megalomicin A is a preferred 
ketolide of the invention. These compounds can be made chemically, substantially 
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in accordance with the procedures for making ketolides described in the prior art, 
or in recombinant host cells of the invention in which the megosamine and 
desosamine biosynthetic and transferase genes are present but which do not make 
or transfer the mycarose moiety and/or the PICS has been modified to delete the 
5 KR domain of extender module 6. The invention also provides methods for 

making intermediates useful in preparing traditional, 6-dEB- and erythromycin- 
derived ketolide compounds. See'Griesgraber etal., supra; Agouridas et al^ 1998, 
J. Med Chem. 41: 4080-4100, U.S. Patent Nos. 5 5 770,579; 5,760,233; 5,750,510; 
5,747,467; 5,747,466; 5,656,607; 5,635,485; 5,614,614; 5,556,118; 5,543,400; 

10 5,527,780; 5,444,051; 5 ; 439,890; 5,439,889; and PCT publication Nos. WO 
98/09978 and 98/28316, each of which is incorporated herein by reference. 

As noted above, the hybrid PKS genes of the invention can be expressed in 
a host cell that contains the desosamine, megosamine, and/or mycarose 
biosynthetic genes and corresponding transferase genes as well as the required 

1 5 hydroxylase gene(s), which may, for example and without limitation, be either 
picK 9 mcgK. or eryK (for the C- 1 2 position) and/or megF overyF (for the C-6 
position). The resulting compounds have antibiotic activity but can be further 
modified, as described in the patent publications referenced above, to yield a 
desired compound with improved or otherwise desired properties. Alternatively, 

20 the aglycone compounds can be produced in the recombinant host cell, and the 

desired glycosylation and hydroxylation steps carried out in vitro or in v/vo, in the 
latter case by supplying the converting cell with the aglycone, as described above. 

The compounds of the invention are thus optionally glycosylated forms of 
the polykctidc set forth in formula (1) below which are hydroxylatcd at either the 

25 C-6 or the C- 1 2 or both. The compounds of formula ( 1 ) can be prepared using the 
loading and the six extender modules of a modular PKS, modified or prepared in 
hybrid form as herein described. These polyketides have the formula: 
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including the glycosylated and isolated stereoisomeric forms thereof; 

wherein R* is a straight chain, branched or cyclic, saturated or unsaturated 
substituted or unsubstituted hydrocarbyl of 1 -1 5C; 
5 each of R'-R 6 is independently H or alkyl (1-4C) wherein any alkyl at R 1 

may optionally be substituted; 

each of X'-X^ is independently two H, H and OH, or =0; or 

each of X'-X 5 is independently H and the compound of formula (2) 
contains a double-bond in the ring adjacent to the position of said X at 2-3 , 4-5, 6- 
10 7, 8-9 and/or 10-11; 

with the proviso that: 

at least two of R'-R 6 are alkyl (1-4C). 

Preferred compounds comprising formula 2 are those wherein at least three 
ofR'-R 5 are alkyl (1-4C), preferably methyl or ethyl; more preferably wherein at 

1 5 least four of R'-R 3 are alkyl (1-4C), preferably methyl or ethyl. Also preferred are 
those wherein X 2 is two H, =0, or H and OH, and/or X 3 is H, and/or X 1 is OH 
and/or X 4 is OH and/or X 5 is OH. Also preferred are compounds with variable R* 
when R ! -R 5 is methyl, X 2 is =0, and X 1 , X 4 and X 5 are OH. The glycosylated 
forms (i.e., mycarose or cladinose at C-3, desosamine at C-5, and/or megosamine 

20 at C-6) of the foregoing are also preferred. 

As described above, there are a wide variety of diverse organisms that can 
modify compounds such as those described herein to provide compounds with or 
that can be readily modified to have useful activities. For example, 
Saccharopolyspora erythraea can convert 6-dEB to a variety of useful 
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compounds. The compounds provided by the present invention can be provided to 
cultures of Saccharopolyspora erythraea and converted to the corresponding 
derivatives of erythromycins A, B, C, and D in accordance with the procedure 
provided in the Examples, below. To ensure that only the desired compound is 
5 produced, one can use an S. erythraea eryA mutant that is unable to produce 6- 
dEB but can still carry out the desired conversions (Weber et aL, 1985, J. 
Bacterid . J64(\): 425-433). Also, one can employ other mutant strains, such as 
eryB, eryC, eryG, and/or eryK mutants, or mutant strains having mutations in 
multiple genes, to accumulate a preferred compound. The conversion can also be 

10 carried out in large fermentors for commercial production. Each of the 

erythromycins A, B, C, and D has antibiotic activity, although erythromycin A has 
the highest antibiotic, activity. Moreover, each of these compounds can form, 
under treatment with mild acid, a C-6 to C-9 hemiketal with motilide activity. For 
formation of hemiketals with motilide activity, erythromycins B, C, and D, are 

15 preferred, as the presence of a C-12 hydroxyl allows the formation of an inactive 
compound that has a hemiketal formed between C-9 and C-12. 

Thus, the present invention provides the compounds produced by 
hydroxylation and glycosylation of the compounds of the invention by action of 
the enzymes endogenous to Saccharopolyspora erythraea and mutant strains of S. 

20 erythraea. Such compounds are useful as antibiotics or as motilides directly or 
after chemical modification. For use as antibiotics, the compounds of the 
invention can be used directly without further chemical modification. 
Erythromycins A, B, C, and D all have antibiotic activity, and the corresponding 
compounds of the invention that result from the compounds being modified by 

25 Saccharopolyspora erythraea also have antibiotic activity. These compounds can 
be chemically modified, however, to provide other compounds of the invention 
with potent antibiotic activity. For example, alkylation of erythromycin at the C-6 
hydroxyl can be used to produce potent antibiotics (clarithromycin is C-6-0- 
methyl), and other useful modifications are described in, for example, Griesgraber 

30 etaL, 1996,J. Antihiot. 49: 465-477, Agouridas et al., 1998,J. Med Chem. 41: 
4080-4100, U.S. Patent Nos. 5,770,579; 5,760,233; 5,750,510; 5,747,467; 
5,747,466; 5,656,607; 5,635,485; 5,614,614; 5,556,118; 5,543,400; 5,527,780; 



82 



ISDOCIO: <WO 01?79ft4A9 \ > 



WO 0 1/27284 PCT/US00/27433 



5,444,051; 5,439,890; and 5,439,889; and PCT publication Nos. WO 98/09978 
and 98/28316, each of which is incorporated herein by reference. 

For use as motilides, the compounds of the invention can be used directly 
without further chemical modification. Erythromycin and certain erythromycin 
5 analogs are potent agonists of the motilin receptor that can be used clinically as 
prokinetic agents to induce phase III of migrating motor complexes, to increase 
esophageal peristalsis and LES pressure in patients with GERD, to accelerate 
gastric emptying in patients with gastric paresis, and to stimulate gall bladder 
contractions in patients after gallstone removal and in diabetics with autonomic 

10 neuropathy. See Peeters, 1999, Motilide Web Site, http://www.med.kuleuven. 
ac.be/mcd/gih/motilid.htm, and Omura et aL, 1987, Macrolides with 
gastrointestinal motor stimulating activity, J. Med Chem. 30: 1941-3). The 
corresponding compounds of the invention that result from the compounds of the 
invention being modified by Saccharopolyspora erythraea also have motilide 

1 5 activity, particularly after conversion, which can also occur in vivo, to the C-6 to 
C-9 hemiketal by treatment with mild acid. Compounds lacking the C-12 hydroxyl 
are especially preferred for use as motilin agonists. These compounds can also be 
further chemically modified, however, to provide other compounds of the 
invention with potent motilide activity. 

20 Moreover, and also as noted above, there are other useful organisms that 

can be employed to hydroxylate and/or glycosylate the compounds of the 
invention. As described above, the organisms can be mutants unable to produce 
the polyketide normally produced in that organism, the fermentation can be carried 
out on plates or in large fermentors, and the compounds produced can be 

25 chemically altered after fermentation. In addition to Saccharopolyspora erythraea, 
Streptomyces venezuelae, S. narbonensis, S. antibioticus, Micromonospora 
megalomicea, S.fradiae, and S. thermotolerans can also be used. In addition to 
antibiotic activity, compounds of the invention produced by treatment with M. 
megalomicea enzymes can have antiparasitic activity as well. Thus, the present 

30 invention provides the compounds produced by hydroxylation and glycosylation 
by action of the enzymes endogenous to 5". erythraea^ S, venezuelae, S. 
narbonensis^ S. antibioticus, M. megalomicea, S. fradiae y and 5". thermotolerans. 
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The present invention also provides methods and genetic constructs for 
producing the glycosylated and/or hydroxylated compounds of the invention 
directly in the host cell of interest. Thus, the recombinant genes of the invention, 
which include recombinant megAI, megAII y and megA III genes with one or more 
5 deletions and/or insertions, including replacements of a megA gene fragment with 
a gene fragment from a heterologous PKS gene, can be included on expression 
vectors suitable for expression of the encoded gene products in 
Saccharopolyspora erythraea, Micromonospora megalomicea, S. venezuelae, S. 
narbonensis, S. aniibioticus^ S. fradiae, and S. thermotolerans. 

1 0 The compounds of the invention can be produced by growing and 

fermenting the host cells of the invention under conditions known in the art for the 
production of other polyketides. The compounds of the invention can be isolated 
from the fermentation broths of these cultured cells and purified by standard 
procedures- The compounds can be readily formulated to provide the 

1 5 pharmaceutical compositions of the invention. The pharmaceutical compositions 
of the invention can be used in the form of a pharmaceutical preparation, for 
example, in solid, semisolid, or liquid form. This preparation will contain one or 
more of the compounds of the invention as an active ingredient in admixture with 
an organic or inorganic carrier or excipient suitable for external, enteral, or 

20 parenteral application. The active ingredient may be compounded, for example, 
with the usual non-toxic, pharmaceutically acceptable carriers for tablets, pellets, 
capsules, suppositories, solutions, emulsions, suspensions, and any other form 
suitable for use. 

The carriers which can be used include water, glucose, lactose, gum acacia, 
25 gelatin, mannitol, starch paste, magnesium trisilicate, talc, corn starch, keratin, 
colloidal silica, potato starch, urea, and other carriers suitable for use in 
manufacturing preparations, in solid, semi-solid, or liquified form. In addition, 
auxiliary stabilizing, thickening, and coloring agents and perfumes may be used. 
For example, the compounds of the invention may be utilized with hydroxypropyl 
30 methylcellulose essentially as described in U.S. Patent No. 4,916,138, 

incorporated herein by reference, or with a surfactant essentially as described in 
EPO patent publication No. 428,169, incorporated herein by reference. 
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Oral dosage forms may be prepared essentially as described by Hondo et 
al. 9 1987, Transplantation Proceedings XIX, Supp. 6: 17-22, incorporated herein 
by reference. Dosage forms for external application may be prepared essentially as 
described in EPO patent publication No. 423,714, incorporated herein by 
5 reference. The active compound is included in the pharmaceutical composition in 
an amount sufficient to produce the desired effect upon the disease process or 
condition. 

For the treatment of conditions and diseases caused by infection, a 
compound of the invention may be administered orally, topically, parenterally, by 

1 0 inhalation spray, or rectally in dosage unit formulations containing conventional 
non-toxic pharmaceutically acceptable carriers, adjuvant, and vehicles. The term 
parenteral, as used herein, includes subcutaneous injections, and intravenous, 
intramuscular, and intrasternal injection or infusion techniques. 

Dosage levels of the compounds of the invention are of the order from 

1 5 about 0.01 mg to about 50 mg per kilogram of body weight per day, preferably 
from about 0. 1 mg to about 1 0 mg per kilogram of body weight per day. The 
dosage levels are useful in the treatment of the above-indicated conditions (from 
about 0.7 mg to about 3.5 mg per patient per day, assuming a 70 kg patient). In 
addition, the compounds of the invention may be administered on an intermittent 

20 basis, i.e., at semi-weekly, weekly, semi-monthly, or monthly intervals. 

The amount of active ingredient that may be combined with the carrier 
materials to produce a single dosage form will vary depending upon the host 
treated and the particular mode of administration. For example, a formulation 
intended for oral administration to humans may contain from 0.5 mg to 5 gm of 

25 active agent compounded with an appropriate and convenient amount of carrier 
material, which may vary from about 5 percent to about 95 percent of the total 
composition. Dosage unit forms will generally contain from about 0.5 mg to about 
500 mg of active ingredient. For external administration, the compounds of the 
invention may be formulated within the range of, for example, 0.00001% to 60% 

30 by weight, preferably from 0.001% to 1 0% by weight, and most preferably from 
about 0.005% to 0.8% by weight. 

It will be understood, however, that the specific dose level for any 
particular patient will depend on a variety of factors. These factors include the 
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activity of the specific compound employed; the age, body weight, general health, 
sex, and diet of the subject; the time and route of administration and the rate of 
excretion of the drug; whether a drug combination is employed in the treatment; 
and the severity of the particular disease or condition for which therapy is sought. 
5 A detailed description of the invention having been provided above, the 

following examples are given for the purpose of illustrating the invention and shall 
not be construed as being a limitation on the scope of the invention or claims. 

Example 1 

10 Cloning and Characterization of the Megalomicin Biosvnthetic Gene Cluster from 

Micromonospora meglomicea 
Experimental Procedures 

Bacterial Strains, Media, and Growth Conditions 

Routine DNA manipulations were performed in Escherichia coli XL1 Blue 

15 or E, coli XL I Blue MR (Stratagene) using standard culture conditions (Sambrook 
el aL, 1989). M. megalomicea subs, nigra NRRL3275 was obtained from the 
ATCC collection and cultured according to recommended protocols. For isolation 
of genomic DNA, M megalomicea was grown in TSB (Hopwood et aL, 1985) at 
30 °C. S. lividans K4-1 14 (Ziermann and Betlach, 1999), which carries a deletion 

20 of the actinorhodin biosynthetic gene cluster, was used as the host for expression 
of the megAI-AW genes. S. lividans strains were maintained on R5 agar at 30°C 
and grown in liquid YEME for preparation of protoplasts (Hopwood et aL, 1985) . 
5'. erythraea NRJRJL2338 was used for expression of the megosamine genes. S. 
erythraea strains were maintained on R5 agar at 34°C and grown in liquid TSB for 

25 preparation of protoplasts. 

Manipulation of DNA and Organisms 

Manipulation and transformation of DNA in E. coli was performed by 
standard procedures (Sambrook et aL, 1989) or by suppliers protocols. Protoplasts 
30 of S. lividans and S. erythraea were generated for transformation by plasmid DNA 
using the standard procedure. S. lividans transformants were selected on R5 using 
2 ml of a 0.5 mg/ml thiostrepton overlay. S. erythraea transformants were selected 
on R5 using 1.5 ml of a 0.6 mg/ml apramycin overlay. 
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Isolation of the meg gene cluster 

A cosmid library was prepared in SuperCos (Stratagene) from M. 
megalomicea total DNA partially digested with Sau3A I, and introduced into E. 
5 coli using a Gigapack HI XL (Stratagene) in-vitro packaging kit. 32 P-labelled DNA 
probes encompassing the KS2 domain from ery DEBS, or a mixture of segments 
encompassing modules 1 and 2 from ery DEBS were used separately to screen the 
cosmid library by colony hybridization. Several colonies which hybridized with 
the probes were further analyzed by sequencing the ends of their cosmid inserts 

10 using T3 and T7 primers. BLAST (Altschul et ai 9 1990) analysis of the sequences 
revealed several colonies with DNA sequences highly homologous to genes from 
the ery cluster. Together with restriction analysis,, this led to the isolation of two 
overlapping cosmids, pKOS079-93A and pKOS079-93D which covered -45 kb of 
the meg cluster. A 400 bp PCR fragment was generated from the left end of and 

15 ptCOS079-93D and used to reprobe the cosmid library. Likewise, a 200 bp PCR 
fragment generated from the right end of pKOS079-93 A was used to reprobe the 
cosmid library. Analysis of hybridizing colonies as described above resulted in 
identification of two additional cosmids, pKOS079-138B and pKOS79-l24B 
which overlap the previous two cosmids. BLAST analysis of the far left and right 

20 end sequences of these cosmids indicated no homology to any known genes 
related to polyketide biosynthesis and therefore indicates that the set of four 
cosmids spans the entire megalomicin biosynthetic gene cluster. 

DNA sequencing and analysis 

25 PCR-based double stranded DNA sequencing was performed on a 

Beckman CEQ 2000 capillary sequencer using reagents and protocols provided by 
the manufacturer- A shotgun library of the entire cosmid pKOS079-93D insert was 
made as follows: DNA was first digested with Dra 1 to eliminate the vector 
fragment, then partially digested with Sau3A I. After agarose electrophoresis, 

30 bands between 1 -3 kb were excised from the gel and ligated with BamW I digested 
pUC19. Another shotgun library was generated from a 12 kb Xho 1/EcoR 1 
fragment subcloned from cosmid pKOS079-93A to extend the sequence to the 
megF gene. A 4 kb Bgl III Xho I fragment from cosmid pKOS079-138B was 
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sequenced by primer walking to extend the sequencing to the megT gene. 
Sequence was assembled using Sequencher (Gene Codes Corp.) software package 
and analyzed with Mac Vector (Oxford Molecular Group) and the NCBI BLAST 
server (www.ncbi.nlm.nih.gov/BLAST/). 

5 

Plasmids 

Plasmid pKOS 108-6 is a modified version of pKA0127'kan' (Ziermann 
and Betlach, 1999; Ziermann and Betlach, 2000) in which the eryAI-III genes 
between the Pac I and EcoK I sites have been replaced with the megM-\\\ genes. 
10 This was done by first substituting a synthetic nucleotide DNA duplex (5 s - 
TAAGAATTCGGAGATCTGGCCTCAGCTCTAGAC (SEQ ID NO: 21), 
complementary oligo 5'- 

AATTGTCTAGAGCTGAGGCCAGATCTCCGAATTCTTAAT (SEQ ID NO: 
22)) between the Pac I and EcoK I sites of the pKA0127'kan' vector fragment. 

15 The 22 kb EcoK l/Bgl II fragment from cosmid pKOS079-93D containing the 

megAI'IJ genes was inserted into EcoK I and Bgl II sites of the resulting plasmid to 
generate pKOS024-84. A 1 2 kb Bgl MIBbvC I fragment containing the megAIII 
and part of the megCll gene was subcloned from pKOS079-93 A and excised as a 
Bgl WlXba I fragment and ligated into the corresponding sites of pKOS024-84 to 

20 yield the final expression plasmid pKOS 108-06. 

The megosamine integrating vector, pKOS97-42, was constructed as 
follows: A subclone was generated containing the 4 kb XJio l/Sca I fragment from 
pKOS79-138B together with the 1 1 kb Sea \IPst I fragment from pKOS79-93D in 
Litmus 28 (Stratagene). The entire 5.7 kb fragment was then excised as a Spe VPst 

25 I fragment and combined with the 6.3 kb Pst VEcoK I fragment from KOS79-93D 
and EcoK VXba I digested pSET152 (Bierman et at., 1992) to construct plasmid 
pKOS97-42. 

Production and analysis of secondary metabolites 
30 Fermentation for production of polyketide, LC/MS analysis, and 

quantification of 6-dEB for S. lividans K4-1 14/pKOS 108-6 and S. lividans K4- 
1 14/pKA0127'kan* were essentially as previously described (Xue et ai, 1999). S. 
erythraea NRRL2338 and 5. erythraea/pKOS97-42 were grown for 6 days in Fl 
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media (Brunker et aL, 1998). Samples of broth were clarified in a microcentrifuge 
(5 min, 13,000 rpm). For LC/MS preparation, isopropanol was added to the 
supernatant (1 :2 ratio) and centrifuged again. Erythromycins and megalomicins 
were detected by electrospray mass spectrometry and quantity was determined by 
5 evaporative light scattering detection (ELSD). The LC retention time and mass 
spectra of erythromycin and megalomicins were identical to known standards. 

Nucleotide sequence of the meg gene cluster 

A series of 4 overlapping inserts containing the meg cluster (Figure 9) were 

10 isolated from a cosmid library prepared from total genomic DNA of M. 

megalomicea and covers > 100 kb of the genome. A contiguous 48 kb segment 
which encodes the megalomicin PKS and several deoxysugar biosynthetic genes 
was sequenced and analyzed. The segment contains 17 complete ORFs as well as 
an incomplete ORF at each end, organized as shown in Figure 9. 

1 5 PKS genes. The ORFs megAJ, megA 11 and megAIII encode the polyketide 

synthase responsible for synthesis of 6-dEB. The enzyme complex, meg DEBS, is 
highly similar to ery DEBS, with each of the three predicted polypeptides sharing 
an average of 83% overall similarity with their ery PKS counterpart. Both PKSs 
are composed of 6 modules (2 modules per polypeptide) and each module is 

20 organized in the identical manner (Figure 9). A dendrogram analysis (Schwecke et 
aL, 1 995) employing 70 acyltranferase (AT) domains revealed that the 6 meg 
extender AT domains cluster with AT domains that incorporate methylmalonyl 
Co A (not shown). The loading module of meg DEBS also lacks a KS Q domain 
which is utilized by most macrolide PKSs for decarboxylation of the starter unit to 

25 initiate polyketide synthesis (Bisang et aL, 1999; Kuhstoss et aL, 1996; Kakavas et 
aL, 1997; Xue et aL, 1998), implying that priming begins with a propionate unit. 
In addition, a conserved Gly to Pro substitution in the NADPH-binding region of 
the ketoreductase (KR) domain of module 3 is observed in meg DEBS, which has 
been proposed to account for its inactivity in ery DEBS (Donadio et aL, 1991). 

30 Deoxysugar genes. BLAST (Altschul et aL, 1990) analysis of the genes 

flanking the PKS indicated that 12 complete ORFs and 1 partial ORF appear to 
encode functions required for synthesis of one of the three megalomicin 
deoxysugars. Assignment of each ORF to a specific deoxysugar pathway was 
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made based on comparison to the ery genes and other related genes involved in 
deoxy sugar biosynthesis (Table 2). 

Table 2. Deduced functions of genes identified in the megalomicin gene cluster. 

Gene Closest Match % Sim* Proposed Proposed Function Reference 

(polypeptide)* Pathway 



megT 


EryBVl 




Mycarose/ 


2,3 -Dehydratase 


(Summers et aL, 1997; 








Megosamine 




Gaisser et ai y 1997) 


megDVl 


EryCIl 


63 


Megosamine 


3,4-lsomerase 


(Summers etai y 1997) 


megDl 


EryCI M 


79 


Megosamine 


G lycosyltransferase 


(Summers et ai y 1997) 


megY 


AcyA (S. 


52 




Mycarose O-acyl- 


(Arisawa et al. t 1994) 




thermotolerans) 






transferase 




megDII 


EryCI 


58 


Megosamine 


A m i notransferase 


(Dhiilon etai y 1989; 




* 








Summers et ai y 1997) 


megDl I! 


DesVl (5. 


61 


Megosamine 


Dimethy transferase 


(Xue et al. y 1998) 




venezuelae) 










megDIV 


DmnU (S. 


65 


Megosamine 


3,5-Eptmerase 


(Olanoe/a/., 1999) 




peucetius) 










megDV 


Dehydrogenase 


61 


Megosamine 


4-Ketoreductase 


(Summers et aL y 1997; van 




{A. oriental is) 








Wageningen et ai y 1998) 


megDVl I 


EryBU 


73 


Megosamine 


2,3-Reductase 


(Summers et al. y 1997) 


megBV 


EryBV 


86 


Mycarose 


Glycosyltransferasc 


(Summers et al. y 1997; 












Gaisser et al. t 1997) 


megBtV 


EryBlV 


SO 


Mycarose 


4-Ketoreductase 


(Summers et ai y 1997; 












Gaisser et at., J 997) 


megAl 


EryAl 


81 


6-dEB 


Polyketide Synthase 


(Donadio and Katz, 1992) 


megAll 


EryAll 


85 


6-dEB 


Polyketide Synthase 


(Donadio and Katz, 1992) 


megAl 11 


EryAU! 


83 


6-dEB 


Polyketide Synthase 


(Donadio and Katz, 1992) 


megCfl 


EryCIl 


82 


Desosamine 


3,4-lsomerase 


(Summers etal. y 1997) 


meg CI 11 


EryCI 11 


89 


Desosamine 


G iycosy lyltransferase 


(Summers etai y 1997) 


megBfl 


EryBIl 


87 


Mycarose 


2,3-Reductase 


(Summers etal. y 1997) 


megH 


EryH 


84 




Thioesterase 


(Haydock et al y 1991) 


megF 


EryF 






C-6 Hydroxylase 


(Weber et ai y 1991) 



5 a. Determined by BLASTX analysis using default parameters. 
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Three ORFs, megBV, me gC III and megDI, encode glycosyltransferases, 
apparently one for attachment of each deoxysugar to the macrolide. MegBV was 
most similar to EryBV, the erythromycin mycarosyltransferase, and hence was 
assigned to the mycarose pathway in the meg cluster. The closest match for both of 
5 the remaining glycosyltransferases was EryCHI, the desosaminyltransferase in 
erythromycin biosynthesis. Given the higher degree of similarity between EryCIII 
and MegCIII (Table 2), MegCIII was designated the desosaminyltransferase, 
leaving MegDI as ihe proposed megosaminyltransferase. In similar fashion, 
assignments were made accordingly for; MegCII and MegDVI, two putative 3,4- 

10 isomerases similar to EryClI; MegBII and MegDVII, 2,3-reductases homologous 
to EryBIl; MegBlV and MegDV, putative 4-ketoreductases similar to EryBIV 
(Table 2). The remaining ORFs involved in deoxysugar biosynthesis, megT^ 
megDII, megDIII and mcgDIK each encode a putative 2,3 -dehydratase, 
aminotransferase, dimethyliransferase and 3,5-epimerase, respectively (Table 2). 

1 5 Since both the megosamine and desosamine pathways require an aminotransferase 
and a dimethyltransferase. and since mycarose and megosamine each require a 
2,3 -dehydratase and a 3,5-epimcrase, assignments of these four genes to a specific 
pathway could not be made on the basis of sequence comparison alone. However, 
the latter three are implicated in megosamine biosynthesis by experiments 

20 described below. 

Other genes. Two additional complete ORFs, designated megY and megH 
and an incomplete OKI*, designated megF, were also identified in the cluster. 
Megl 1 and MegF share high degrees of similarity with EryH and EryF. EryH and 
homologs in other macrolide gene clusters are thioesterase-like proteins with 

25 unknown function in polyketide gene clusters (Haydock et al., 1991; Xue et aL, 

1 998; Butler et al. , 1 999; Tang et al. , 1 999). EryF encodes the erythronolide B C-6 
hydroxylase (Figure 8) (Weber et ai y 1991; Andersen and Hutchinson, 1992). 
MegY does not have an ery counterpart but appears to belong to a (small) family 
of O-acykransferases that transfer short acyl chains to macrolides. Two classes 

30 exist: AcyA and MdmB transfer acetyl or propionyl groups to the C-3 hydroxyls 
on 16-membered macrolide rings (Arisawa et al., 1994; Hara and Hutchinson, 
1992); CarE and Mpt transfer isovalcrate or propionate to the mycarosyl moiety of 
carbomycin and midecamycin, respectively (Epp et al., 1989; Arisawa et al.^ 1993; 
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Gu et aL, 1996). The structures of various megalomicins suggest that MegY 
belongs to the latter class and is the acyltransferase which converts megalomicin A 
to megalomicins B, CI, or C2 (verified experimentally below). 

5 Heterologous expression of the meg PKS genes. 

The wild type and genetically modified versions of the ery DEBS have 
been used extensively in heterologous Streptomyces hosts for enzyme studies and 
the production of novel polyketide compounds. Given the similarities between the 
ery and meg DEBSs, production characteristics were compared in a commonly 

10 used Streptomyces host strain. The three megA ORFs were cloned into the 

expression plasmid pKA0127'kan' (Ziermann and Betlach, 1999) in place of the 
ery A ORFs. Both plasmids, pKA0127'kan' encoding ery DEBS and pKOS 108-06 . 
encoding meg DEBS, were introduced in Streptomyces lividans K4-1 14 and the 
production of 6-dEB was determined in shake-flask fermentations. The production 

1 5 profiles were similar in both cases and the maximum titer of 6-dEB was between 
30-40 mg/L. In addition, both PKSs produced small amounts (~5%) of 8,8a- 
deoxyoleandolide, which results from the priming of the PKS with acetate instead 
of propionate (Kao et al., 1994b). This observation indicates that the loading AT 
domains of the PKSs display similar relaxed specificities towards starter units. 

20 

Conversion of erythromycin to megalomicin in S. erythraea. 

An examination of the meg duster revealed that the putative megosamine 
biosynthetic genes are clustered directly upstream of the PKS genes. If the 
hypothesis that these genes are sufficient for biosynthesis and attachment of 

25 megosamine to an erythromycin intermediate is correct, then functional expression 
of these genes in a strain which produces erythromycin, such as S. erythraea, 
should result in production of megalomicin. A 12 kb DNA fragment carrying all 
the genes between the leftmost Xho\ site and the EcoRl site (Figure 9) was 
integrated in the chromosome of 5. erythraea using the site-specific integrating 

30 vector pSETl 52 (Bierman et aL, 1992). It was surmised that the left and right ends 
of this fragment would contain necessary promoter regions for transcription of the 
convergent set of genes in M megalomicea and that they would likely operate in 
5. erythraea. 
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Fermentation broth from S. erythraeaA^OS91 '-42, which contains the 
integrated meg genes, was analyzed by LC/MS and compared to LC/MS profiles 
of the parent S. erythraea strain without the meg genes, as well as to megalomicin 
standards purified from M. megalomicea. The new strain was found to produce a 
5 mixture of erythromycin A and various megalomicins (-4: 1 ratio), thereby 

showing that the predicted megosamine biosynthetic and glycosyltransferase genes 
are contained within the cloned meg fragment. The two most abundant congeners 
identified were megalomicins B and CI. Megalomicin A and C2 were also 
detected in smaller amounts. The presence of the megalomicins B ? CI and C2 also 
10 provides direct evidence for the function of the O-acyl transferase, MegY, which 
is present in the integrated meg fragment. 

Discussion 

The homologies observed among modular PKSs enabled the use of ery 
1 5 PKS genes to clone the meg biosynthetic gene cluster from M megalomicea. The 
close similarities between the megalomicin and erythromycin biosynthetic 
pathways is also reflected in the overall organization of their genes and in the high 
degree of homology of the corresponding individual gene -encoded polypeptides. 
Production of 6-dEB from meg DEBS in S. lividans and conversion of 
20 erythromycin to megalomicin using the megD genes in S. erythraea provides 
direct evidence that the identified gene cluster is responsible for synthesis of 
megalomicin. 

As seen in Figure 9, the ~- 40 kb segments of the two clusters beginning 
with ery/megB V on the left through the ery/megF genes retain a nearly identical 

25 organizational arrangement. The notable differences in this region are eryG and 
IS JJ 36 which are absent from the segment of the meg cluster analyzed. The eryG 
gene encodes an S-adenosylmethionine (S AM)-dependent mycarosyl 
methyltransferase that converts erythromycin C to erythromycin A (Figure 8) 
(Weber ei aL, 1990; Haydock et aL 9 1991). The mycarose moiety is modified by 

30 esterification (MegY) in megalomicin biosynthesis (Figure 8) and, therefore, the 
absence of an eryG homolog would be expected in the meg cluster. The \S1136 
element located between eryAI and £rv/4//(Donadio and Staver, 1993) is not 
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known to play a role in erythromycin biosynthesis and its origin in the ery cluster 
has not been determined. 

Upstream of the common meg/ery Bl V and SKgenes. the gene clusters 
diverge. The - 6 kb segment between eryBV and eryK, the left border of the ery 
5 gene cluster (Pereda et aL, 1997), contains the remaining genes required for 

mycarose (eryBVJ and BVII) and desosamine biosynthesis (eryCIV, CV, and CVI) 
and the C-12 hydroxylase {eryK) (Stassi et al., 1993). In contrast, the region 
upstream of megBV encodes a set of genes (megDI-DVII and megY) which can 
account for all the activities unique to megalomicin biosynthesis (Figure 9). Since 

10 introduction of this meg DNA segment into S. erythraea results in production of 
megalomicins, it is clear that these genes encode the functions for TDP- 
megosamine biosynthesis and transfer to its putative substrate erythromycin C, and 
to acylate megalomicin A (Figure 8). The remaining region upstream of megDVI 
should therefore encode genes only for mycarose and desosamine biosynthesis. 

1 5 Olano et aL (Olano et al, 1999) have recently described a pathway for 

biosynthesis of TDP-L-daunosamine, a deoxysugar component of the antitumor 
compounds daunorubicin and doxorubicin produced by Streptomyces peucetius. 
Their pathway proposes four steps from the intermediate TDP-4-keto-6- 
deoxyglucose controlled by the gene cluster dnmJQTUVZ, although the functions 

20 for dnmO and dnmZcoxAd not be identified and the precise order of reactions in 
the pathway could not be determined. The genes dnmT, dnmU, dnmJ and dnmV 
each have proposed counterparts in the meg cluster, mcgT, megDIV, megDII, and 
megDV, respectively (see Figure 10) 

It is possible to describe a pathway to convert TDP-2,6-dideoxy-3,4- 

25 diketo-D-hexose (or its enol tautomer), the last intermediate common to the 

mycarose and megosamine pathways, to TDP-megosamine through the sequence 
of 5-epimerization, 4-ketoreduction, 3-amination, and 3-//-dimethylation 
employing the genes megDIV, megDV, megDII, and megDlll. This employs the 
same functions proposed for biosynthesis of TDP-daunosamine by Olano et at., 

30 but in a different sequential order. However, it does not account for the megDVI 
and megDVII genes since their activities are not required for this route. A parallel 
pathway which employs these genes is also shown in Figure 10. In this alternate 
route, 2,3-reduction and 3,4-tautomerization are performed by the megDVII and 
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megDVI gene products, respectively. A unified single pathway that employs both 
4-ketoreduction (megDV) and 2,3-reduction (megDVIJ) could not be determined. 
Because the entire gene set from megDVI through megDV 1 1 was introduced in S. 
erythraea to produce TDP-megosamine, it is not possible to determine which, if 
5 either, of the two alternative pathways is operative, but this can be addressed 
through systematic gene disruption and complementation. 

The 48 kb segment sequenced also contains genes required for synthesis of 
TDP-L-mycarose and TDP-D-desosamine (Fig 10). For the latter, megCII, which 
encodes a putative 3,4-isomerase, the first step in the committed TDP-desosamine 

10 pathway, appears to be translationally coupled to megAIII, almost exactly as its 
erythromycin counterpart, eryCII, was found translationally coupled to eryAI/I 
(Summers et ai, 1997). The high degree of similarity between MegCII and EryCII 
suggests that the pathway to desosamine in the megalomicin- and erythromycin- 
producing organisms are most likely the same. Similarly, the finding that megBII 

15 and megBIV, encoding a 2,3-reductase and 4-ketoreductase, contain close 

homologs in the mycarose pathway for erythromycin also suggests that TDP-L- 
mycarose synthesis in the two host organisms is the same. 

Of interest are the two genes that encode putative 2,3-reductases, megBII 
and megDVIL Because MegBII most closely resembles EryBIl, a known mycarose 

20 biosynthetic enzyme (Weber et al. 9 1990), and because megBII resides in the same 
location of the meg cluster as its counterpart in the ery cluster, megBII is assigned 
to the mycarose pathway and megDVI I to the megosamine pathway. Furthermore, 
the lower degree of similarity between MegDVII and either EryBH or MegBII 
(Tabic 2) provides a basis for assigning the opposite L and D isomeric substrates 

25 to each of the enzymes (Figure 10). Finally, megT, which encodes a putative 2,3- 
dehydratase, is also related to a gene in the ery mycarose pathway, eryBVL In S. 
erythraea, the proposed intermediate generated by EryBVI represents the first 
committed step in the biosynthesis of mycarose (Figure 10). However, the 
proposed pathways in Figure 10 suggest this may be an intermediate common to 

30 both mycarose and megosamine biosynthesis in M. megalomicea. Therefore, megT 
is named following the designation of the equivalent gene in the daunosamine 
pathway, dnmT (Olano et al. 9 1999) 
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The preferred host-vector system for expression of meg DEBS described 
here has been used previously for the heterologous expression of modular PKS 
genes from the erythromycin (Kao et al^ 1994a; Ziermann and Betlach, 1999), 
picromycin (Tang et ai, 1999) and oleandomycin pathways, as well as for the 
5 generation of novel polyketide backbones where domains have been removed, 
added or exchanged in various combinations (McDaniel et aL, 1999). Recently, 
hybrid polyketides have been generated through the co-expression of subunits 
from different PKS systems (Tang et ai, 2000). 

Expression of the megDVI-megDVll segment in S. erythraea and the 

1 0 corresponding production of megalomicins in this host establishes the likely order 
of sugar attachment in megalomicin synthesis. Furthermore, it provides a means to 
produce megalomicin in a more genetically friendly host organism, leading to the 
creation of megalomicin analogs by manipulating the PKS. Over 60 6-dEB 
analogs have been produced by combinatorial biosynthesis using the ery PKS 

15 (McDaniel et ai, 1999; Xue et ai, 1999). The titers of megalomicin could also be 
significantly increased above the 5 mg/L obtained from M megalomiciea by 
introducing the genes into an industrially optimized strain of S. erythraea, many of 
which can produce as much as 10 g/L of erythromycin. 
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Example 2 

Stabilizing meg PKS Expression Plasmid by Codon Engineering 

30 Materials and methods 

All bacterial strains were cultured and transformed as described in 
Example 1 . 
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Fermentation of Streptomyces and dike tide feeding 

Primary Streptomyces transformants were picked and placed in 6 mL of 
TSB liquid medium with 50 fig/L of thiostrepton and grown at 30°C. When the 
5 culture showed some growth (3-4days), it was transferred into a 250 mL flask 

containing 50 mL of R6 medium (pH 7.0) with 25 ug/L of thiostrepton and 1 g/L of 
diketide ((2s ? 3R)2-methyJ-3-hydroxyhexanoate N-propionyl cysteamine thioester) 
and placed in a 30°C incubator for 7 days. 



1 0 Changing codons and making plasm ids 

There are several identical sequences in the coding sequences for module 2 
and module 6 of the megalomicin PKS gene cluster. Expression plasmids 
containing the full length megalomicin PKS appeared to be somewhat unstable 
and subject to deletion in recA* strains like ET1 24567 and Streptomyces by intra- 

1 5 plasmid homologous recombination. To prevent significant homologous 

recombination and so stabilize expression plasmids, the codons of two regions of 
the module 6 coding sequence that are identical to regions in the module 2 coding 
sequence were changed without changing the sequence of protein encoded. The 
two regions changed in module 6 were from the 26739 lh base to 27,267 lh base and 

20 from position 27,697 base to 27,987 base, which were identical to the region 
from position 6810 th base to 7338 lh base and regions from position 7778 th base to 
8068 base, respectively. The start codon of the loading domain of the meg PKS 
was set to be the l sl base. These sequences are shown below 



25 > 6810-7338 Sequence in Module 2 

TTGCAGCGGTTGTCGGTGGCGGTGCGGGAGGGGCGTCGGGTGTTGGGTGTGGTGGTGGGT 
TCGGCGGTGAATCAGGATGGGGCGAGTAATGGGTTGGCGGCGCCGTCGGGGGTGGCGCAG 
CAGCGGGTGATTCGGCGGGCGTGGGGTCGTGCGGGTGTGTCGGGTGGGGATGTGGGTGTG 
GTGGAGGCGCATGGGACGGGGACGCGGTTGGGGGATCCGGTGGAGTTGGGGGCGTTGTTG 

30 GGGACGTATGGGGTGGGTCGGGGTGGGGTGGGTCCGGTGGTGGTGGGTTCGGTGAAGGCG 
AATGTGGGTCATGTGCAGGCGGCGGCGGGTGTGGTGGGTGTGATCAAGGTGGTGTTGGGG 
TTGGGTCGGGGGTTGGTGGGTCCGATGGTGTGTCGGGGTGGGTTGTCGGGGTTGGTGGAT 
TGGTCGTCGGGTGGGTTGGTGGTGGCGGATGGGGTGCGGGGGTGGCCGGTGGGTGTGGAT 
GGGGTGCGTCGGGGTGGGGTGTCGGCGTTTGGGGTGTCGGGGACGAAT (SEQ ID NO: 23) 

35 > 26736-27267 Sequence in Module 6 

CTGCAGCGGTTGTCGGTGGCGGTGCGGGAGGGGCGTCGGGTGTTGGGTGTGGTGGTGGGT 
TCGGCGGTGAATCAGGATGGGGCGAGTAATGGGTTGGCGGCGCCGTCGGGGGTGGCGCAG 
CAGCGGGTGATTCGGCGGGCGTGGGGTCGTGCGGGTGTGTCGGGTGGGGATGTGGGTGTG 
GTGGAGGCGCATGGGACGGGGACGCGGTTGGGGGATCCGGTGGAGTTGGGGGCGTTGTTG 
40 GGGACGTATGGGGTGGGTCGGGGTGGGGTGGGTCCGGTGGTGGTGGGTTCGGTGAAGGCG 
AATGTGGGTCATGTGCAGGCGGCGGCGGGTGTGGTGGGTGTGATCAAGGTGGTGTTGGGG 
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TTGGGTCGGGGGTTGGTGGGTCCGATGGTGTGTCGGGGTGGGTTGTCGGGGTTGGTGGAT 
TGGTCGTCGGGTGGGTTGGTGGTGGCGGATGGGGTGCGGGGGTGGCCGGTGGGTGTGGAT 
GGGGTGCGTCGGGGTGGGGTGTCGGCGTTTGGGGTGTCGGGGACGAAT (SEQ ID NO: 24) 
> 26736-27267 Sequence with- Codon Changes 
5 CTGCAGCGCCTCTCCGTCGCCGTCCGCGAGGGCCGCCGAGTCCTCGGCGTCGTCGTCGGC 
TCGGCCGTCAACCAAGACGGCGCGTCAAACGGCCTCGCCGCGCCCTCCGGCGTCGCCCAG 
CAGCGCGTCATACGCCGCGCGTGGGGACGCGCCGGAGTATCGGGCGGCGACGTCGGAGTC 
GTCGAGGCCCACGGCACCGGCACCCGCCTCGGGGATCCCGTCGAGCTGGGCGCCCTCCTG 
GGCACGTACGGCGTCGGCCGCGGCGGCGTCGGCCCGGTCGTCGTCGGCAGCGTCAAGGCC 
10 AACGTCGGCCACGTCCAGGCCGCGGCCGGCGTCGTCGGGGTCATCAAGGTCGTCCTCGGC 
CTCGGCCGCGGGCTGGTCGGCCCGATGGTCTGCCGCGGCGGCCTCAGCGGCCTCGTCGAC 
TGGTCGTCCGGCGGCCTGGTCGTCGCGGACGGGGTCCGCGGCTGGCCGGTCGGCGTCGAC 
GGCGTCCGCCGGGGCGGCGTCTCGGCGTTCGGCGTCAGCGGGACGAAT { SEQ ID NO: 25) 



15 > 6978-7337 Sequence in Module 2 

GGTGGAGTGTGATGCGGTGGTGTCGTCGGTGGTGGGGTTTTCGGTGTTGGGGGTGTTGGA 
GGGTCGGTCGGGTGCGCCGTCGTTGGATCGGGTGGATGTGGTGCAGCCGGTGTTGTTCGT 
GGTGATGGTGTCGTTGGCGCGGTTGTGGCGGTGGTGTGGGGTTGTGCCTGCGGCGGTGGT 
GGGTCATTCGCAGGGGGAGATCGCGGCGGCGGTGGTGGCGGGGGTGTTGTCGGTGGGTGA 

20 TGGTGCGCGGGTGGTGGCGTTGCGGGCGCGGGCGTTGCGGGCGTTGGCCGG {SEQ ID NO: 
26) 

> 27697-27987 Sequence in Module 6 

GGTGGAGTGTGATGCGGTGGTGTCGTCGGTGGTGGGGTTTTCGGTGTTGGGGGTGTTGGA 
GGGTCGGTCGGGTGCGCCGTCGTTGGATCGGGTGGATGTGGTGCAGCCGGTGTTGTTCGT 
25 GGTGATGGTGTCGTTGGCGCGGTTGTGGCGGTGGTGTGGGGTTGTGCCTGCGGCGGTGGT 
GGGTCATTCGCAGGGGGAGATCGCGGCGGCGGTGGTGGCGGGGGTGTTGTCGGTGGGTGA 
TGGTGCGCGGGTGGTGGCGTTGCGGGCGCGGGCGTTGCGGGCGTTGGCCGG (SEQ ID NO: 
27) 

> 27697-27987 Sequence with Codon Changes 

30 CGTGGAGTGCGATGCGGTCGTGTCGAGCGTCGTCGGCTTCAGCGTGCTGGGCGTCCTGGA 
GGGCCGCAGCGGCGCCCCGAGCCTGGACCGCGTCGACGTGGTCCAGCCGGTCCTGTTCGT 
GGTCATGGTCAGCCTGGCCCGCCTGTGGCGCTGGTGCGGCGTGGTCCCGGCCGCCGTGGT 
CGGCCACAGCCAGGGCGAGATCGCCGCCGCGGTCGTGGCCGGCGTCCTGAGCGTCGGCGA 
CGGCGCCCGCGTCGTGGCCCTGCGCGCCCGCGCCCTGCGCGCCCTGGCCGG (SEQ ID NO: 

35 28) 



Three pieces of DNA from the two regions above were synthesized and verified by 
Retrogen, and the synthesized DN As were cloned into pCR-Blunt II -TOPO, as 
shown in the Table 3 below. 

40 



Table 3. Plasmids containing synthesized DNA 



Plasmids 


Cloning sites and positions in meg PKS 


pKOS97-1613 


Pstl-BamHI, 26,739 ,h -26,947 1,, base 


PKOS97-1622 


BamHI-BsmI, 26,947 th -27,267 th base 


PKOS97-1628 


SfaNI-Fsel, 27,697 th - 27,987 th base 



Assembly of the expression plasmid 

First, ligation of the Pstl-BamHI fragment of pKOS97-l613, the BamHl- 
45 BsmI fragment of pKOS97-1622 and Bsml-PstI linearized pKOS97-90 produced 
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pKOS97-15L Then, the insertion of the SfaNI-Fsel fragment of pKOS97-l628 
into pKOS97-151 gave rise to pKS097-152. Then, the Pstl-BlpI fragment of 
pKOS97-125 was used to replace the Pstl-Bipl fragment of pKOS97-90a and 
produced pKOS97- 160. 
5 The final expression plasmid (in pRM5) pKOS97-162 was the result of 

Bglll-Nhel fragment of pKOS97-160 inserted into Bglll-Nhel sites of pKOS108- 
04. 

Another expression plasmid pKOS97-152a was made by a four-fragment 
ligation. The four fragments were a Blpl-Xbal fragment (containing a cos site) of 
10 pKOS97-92a, a Bglll-Pstl fragment of pKOS97-81 ? a Pstl-Blpl fragment of 
pKOS97-152, and a Bglll-Xbal fragment of pKOS 108-04 (as the vector). 

Tests of the constructed plasmids showed that the plasmids containing the 
modified coding sequences were more stable than plasmids containing unmodified 
coding sequence. 

15 

Example 3 
Construction of Ole-Meg Hybrid PKS 
Construction of pRM I -based pKOS098-4S for the expression of OlePKS modules 
1-4. 

20 The 240-bp fragment containing the 3 '-end portion of oleAII gene (at nt 

11210-11 452; the first base of the start codon of oleAII is nt 1 ) was PCR amplified 
with primers N98-38-1 (5*GAACAACTCCTGTCTGCGGCCGCG-3 , ) (SEQ ID 
NO: 29) and N98-38-3 (5'- 

CGGAATTCTCTAGACTCACGTCTCCAACCGCTTGTCGAGG-3') (SEQ ID 
25 NO: 30). The fragment contains a naturally occurring NotI site at its 5 ? -end and 
the engineered Xbal (bold) and EcoRI sites (underline) at its 3 '-end following the 
oleAII stop codon. pKOS38-189 was digested with EcoRI and NotI to give five 
fragments of 8 kb, 5 kb, 4 kb, 2.5 kb and 2 kb. The 8-kb EcoRI-NotI fragment 
containing oleAII gene nt 2961 to nt 11210 and the 240-bp NotI, EcoRI treated 
30 PCR fragment were ligated into litmus 28 at the EcoRI site via a three-fragment 
ligation to give pKOS98-46. The 8.2-kb EoRI fragment from pKOS98-46 was 
cloned into pKOS38-l 74, a pRJvl 1 derived plasmid containing oleAI and nt 1 to nt 
2960 of o/eA II to give pKOS98-48. 
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Construction of pSETJ 52-based pKOS98-60 for the expression of megPKS 
modules 5-6. 

The 360-bp fragment containing nt 1 to nt 366 of megA III was PCR 
5 amplified with primers N98-40-3 (5 5 - 

TCTAGAC TTAATTAA GGAGGACAC/^r^rGAGCGA-GAGCAGC- 
GGCATGACCG-3 ') (SEQ ID NO: 31) and N98-40-2 (5'- A ACGCCTCCC AG- 
GAG ATCTCCAGCA-3') (SEQ ID NO: 32). A Pad site and aNdel site as well 
as the ribosome binding site were introduced at the 5'-end of the megAl start 

10 codon. The 360-bp Pacf-BglH fragment was inserted into pKOS108-06 replacing 
the 22-kb Pacl-BgUI fragment to yield pKOS98-55. The 10-kb Pacl-Xbal 
fragment containing rnegAIII gene and the annealed oligos N98-23-1 (5'- 
AATTCATAGCCTAGGT-3') (SEQ ID NO: 33) and N98-23-2 (5'- 
CTAG ACCTAGGCTATG-3 ') (SEQ ID NO: 34) were ligated to Pad and EcoRI 

15 treated pSET152 derivative pKOS98-14 via a three-fragment ligation to give 
pKOS98-60. 

Example 4 

Conversion of Erythronolides to Erythromycins 
20 A sample of a polyketide (^50 to 100 mg) is dissolved in 0,6 mL of 

ethanoi and diluted to 3 mL with sterile water. This solution is used to overlay a 
three day old culture of Saccharopolyspora erythraea WHM34 (an eryA mutant) 
grown on a 100 mm R2YE agar plate at 30°C. After drying, the plate is incubated 
at 30°C for four days. The agar is chopped and then extracted three times with 100 
25 mL portions of 1% triethylamine in ethyl acetate. The extracts are combined and 
evaporated. The crude product is purified by preparative HPLC (C-I8 reversed 
phase, water-acetonitrile gradient containing 1% acetic acid). Fractions are 
analyzed by mass spectrometry, and those containing pure compound are pooled, 
neutralized with triethylamine, and evaporated to a syrup. The syrup is dissolved 
30 in water and extracted three times with equal volumes of ethyl acetate. The 
organic extracts are combined, washed once with saturated aqueous NaHC03, 
dried over Na2S04, filtered, and evaporated to yield —0.15 mg of product. The 
product is a glycosylated and hydroxylated compound corresponding to 
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erythromycin A, B ? C, and D but differing therefrom as the compound provided 
differed from 6-dEB. 

Example 5 

5 Measurement of Antibacterial Activity 

Antibacterial activity is determined using either disk diffusion assays with 
Bacillus cereus as the test organism or by measurement of minimum inhibitory 
concentrations (MIC) in liquid culture against sensitive and resistant strains of 
Staphylococcus pneumoniae. 

10 

Example 6 
Evaluation of Antiparasitic Activity 
Compounds can initially screened in vitro using cultures of P. falciparum 
FCR-3 and Kl strains, then in vivo using mice infected with P. berghei. Mammalian 
15 cell toxicity can be determined in FM3A or KB cells. Compounds can also be 

screened for activity against P. berhei. Compounds are also tested in animal studies 
and clinical trials to test the antiparasitic activity broadly (antimalarial, 
trypanosomiasis and Leishmaniasis), 

20 The invention having now been described by way of written description 

and example, those of skill in the art will recognize that the invention can be 
practiced in a variety of embodiments and that the foregoing description and 
examples are for purposes of illustration and not limitation of the following 
claims. 
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Claims 

1 . An isolated nucleic acid comprising a nucleotide sequence 
encoding a domain of megalomicin polyketide synthase (PKS) or a megalomicin 
modification enzyme. 

5 

2. The isolated nucleic acid of claim 1 , which encodes a PKS open 
reading frame (ORF) selected from the group consisting of megAI, megAII and 
megAIU. 

10 3. The isolated nucleic acid of claim 1, wherein the PKS domain is 

selected from the group consisting of a TE domain, a KS domain, an AT domain, 
an ACP domain, a KR domain, a DH domain, and an ER domain. 

4. The isolated nucleic acid of claim 1, wherein the nucleic acid 

15 comprises the coding sequence for a loading module, a thioesterase domain, and 
all six extender modules of megalomicin PKS. 

5. The isolated nucleic acid of claim 1 , which encodes a megalomicin 
modification enzyme that is involved in the conversion of 6-dEB into a 

20 megalomicin. 

6. The isolated nucleic acid of claim 5, which encodes a megalomicin 
modification enzyme thai is involved in the biosynthesis of mycarose, 
megosamine or desosaminc. 

25 

7. The isolated nucleic acid of claim 1, wherein the nucleic acid 
codons of homologous regions within the PKS or the megalomicin modification 
enzyme coding sequence have been changed to reduce or abolish the homology 
without changing the amino acid sequences encoded by said changed nucleic acid 

30 codons. 
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8. The isolated nucleic acid of claim 1, which isolated nucleic acid 
fragment hybridizes to a nucleic acid having a nucleotide sequence set forth in the 
SEQ. IDNO:l. 

5 9. A polypeptide, which is encoded by the isolated nucleic acid 

fragment of claim 1 . 

10. A recombinant DNA expression vector, comprising the isolated 
nucleic acid of claim 1 operably linked to a promoter. 

10 

11. A recombinant host cell, comprising the recombinant DNA 
expression vector of claim 10. 

1 2. The recombinant host cell of claim 1 1 , which is a Streptomyces or 
15 Saccharopo/yspora host cell. 

13. A recombinant host cell of claim 1 1, which comprises: 

a) at least two separate autonomously replicating recombinant DNA 
expression vectors, each of said vectors comprises a recombinant DNA compound 

20 encoding a mcgalomicin PKS domain or a megalomicin modification enzyme 
operably linked to a promoter; or 

b) at least one autonomously replicating recombinant DNA expression 
vector and at least one modified chromosome, each of said vector(s) and each of 
said modified chromosome comprises a recombinant DNA compound encoding a 

25 mcgalomicin PKS domain or a megalomicin modification enzyme operably linked 
to a promoter. 

14. A hybrid PKS that comprises a polypeptide of claim 9 and is 
composed of at least a portion of a megalomicin PKS and at least a portion of a 

30 second PKS for a polyketide other than megalomicin. 
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15. The hybrid PKS of claim 14, wherein the second PKS is selected 
from the group consisting of a narbonolide PKS, an oleandolide PKS, and a DEBS 
PKS. 

5 16. The hybrid PKS of claim 1 5 that is composed of the megAI and 

megAIl gene products and the oleAIII gene product. 

1 7. The hybrid PKS of claim 1 6, wherein the KS domain of module 1 
of the megAI gene product has been inactivated by mutation. 

10 

18. A method of producing a polyketide, which method comprises 
growing the recombinant host cell. of claim 1 1 under conditions whereby the 
megalomicin PKS domain encoded by the recombinant expression vector is 
produced and the polyketide is synthesized by the cell, and recovering the 

1 5 synthesized polyketide. 

19. A recombinant host cell that comprises a recombinant expression 
vector that encodes a megalomicin modification enzyme. 

20 20. The recombinant host cell of claim 19 that produces megosamine 

and can attach megosamine to a polyketide, wherein said host cell, in its naturally 
occurring non-recombinant state cannot produce megosamine. 
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LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 

REFERENCE 
AUTHORS 
TITLE 



JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

FEATURES 

source 



gene 
CDS 



gene 
CDS 



gene 
CDS 



gene 



1 47981 bp DNA 01-MAY-2000 

Megalomicin biosynthetic gene cluster, polyketide synthase, 
desosamine, megosamine, and mycarose biosynthesis genes. 
1 



Micromonospora megalomicea. 
Micromonospora megalomicea 
Unclassified . 

1 (bases 1 to 47981) 

Volchegursky,y. , Hu,Z., Katz,L. and McDaniel,R. 
Biosynthesis of the Anti-Parasitic Agent Megalomicin: 
Transformation of Erythromycin to Megalomicin in Saccharopolyspora 
erythraea 
Unpubl i shed 

2 (bases 1 to 47981) 
McDaniel,R. and Volchegursky , Y . 
Direct Submission 

Submitted (01 -MAY-2000) Kosan Biosciences, Inc., 3828 Bay Center 
Place, Hayward, CA 9454 5, USA 

Location/Qualif iers 

1.. 47981 

/organism^ "Micromonospora megalomicea" 

/strains ,, NRRL3275" 

/ sub_spe c i e s = "nigra " 

complement ( <1 . . 144 ) 

/genes'TnegT" 

complement (<1 . .144) 

/gene="megT " 

/codon_start=l 

/trans l_table=ll 

/product- "TOP- 4 -keto-6 -deoxyglucose-2 , 3 -dehydratase" 
/translations "MGDRVNGKATPESTQSAIRFLTRHGGPPTATDDVHDWLAHRAAE 

ERLE " (SEQ ID NO: 2) 

)2B . .2061 
/gene=" megDVl " 
928 . .2061 
/ gene= "megDVl " 
/codon__start = l 
/ trans l_table=ll 

/product="TDP-4 -keto-6 -deoxyhexose 3 , 4 -isomerase" 

/translation^ "MAVGDRRRLGRELQMARGLYWGFGANGDLYSMLLSGRDDDPWTW 

YERLRAAGRGPYASRAGTWWGDHRTAAEVLADPGFTHGPPDAARWMQVAHCPAASWA 

GPFREFYARTEDAASVTVDADWLQQRCARLVTELGSRFDLViroFAREVPVLALGTAPA 

LKGVTD PDRLRS WTS ATRVC LDAQVS PQQLAVT EQALTALDE I DAVTGGRDAAVLiVGW 

AElAANTVGNAVIiAVTELPELAARLADDPETATRVVTEVSRTSPGVHIjERRTAA 

VGG VD VPTGGE VTVWAAANRD PEVFTDPDRFD VDRGGD AE I LS S RPGS PRTDLDALV 

ATL AT AALRAAAP VL P RL S RSG P VI RRRRS P VARGL S RC P VE L " (SEQ ID NO: 3) 

2072. .3382 

/gene= *' megDI 

2072. .3382 

/ gene= " megDI " 

/codon_start=l 

/ trans l_table= 11 

/product ="TDP -megosamine glycosyl trans f erase" 

/translations MRWFS S MAVNSHLFGLVPLAS AFQAAGHEVRWAS PALTDD VT 
GAGLTAVPVGDDVELVEWHAHAGQDIVEYMRTLDWVDQSHTTMSWDDIjbGMQTTFTPT 
FFAIiMSPDSIilDGMVEFCRSWRPDWIVWEPLTFAAPIAARVTGTPHARMLWGPDVATR 
ARQSFLRLLAHQEVEHREDPLAEWFDWTLRRFGDDPHIjSFDEELVIjGQWTVDPIPEPL 
RIDTGVRTVGMRYVPYNGPSWPAWIiLiREPERRRVCLTLGGSSREHGIGQVSIGEMLD 
AI AD I DAEF VATFDDQQLVG VGS VPANVRTAG F VPMNVJjL PTCAATVHHGGTG S WIjTA 
AI HG VPQ 1 1 LSDADTEVHAKQLQDLGAGLS LP VAGMTAEHLRG AI ERVLDE PAYRI*GA 
ERMRDGMRTDPSPAQWGICQDLAADRAARGRQPRRTAEPHLPR" (SEQ ID NO: 4) 
3462 - .4634 



7/30 



SDOCID: <WO 01272B4A2 I > 



WO 01/27284 PCT/US00/27433 



/gene= "megY " 
CDS 3462 . .4634 

/gene="megY" 
/codon_start=l 
/trans l_table=ll 

/product = "mycarose O-acyltransf erase" 

/ 1 ran slation=" MVTSTNLDTTARPALNS I/TGMR FVAAFL VF FTHVLS RL I PNS YV 
YADGLDAFWQTTGRVGVSFFFILSGFVLTWSARASDSVWSFWRRRVCKLFPNHLVTAF 
AAWLFLVTGQAVSGEALIPNLIiLIHAWFPALEISFGINPVSWSLACEAFFYLCFPLF 
LFWISGIRPERLWAWAAWFAAIWAVPWADLLLPSSPPLIPGLEYSAIQDWFLYTFP 
ATRSLEFII/SIIIARILITGRWINVGLLPAVIiliFPVFFVASLFLPGVYAISSSMMILP 
LVLI IASGATADLQQKRTFMRNRVMVWLGDVS FALYMVHFLVI VYGADLLGFSQTEDA 
PLGLALFMIIPFLAVSLVLSWLLYRFVELPVMRNWARPASARRKPATEPEQTPSRR" 

gene 4651.. 5775 (SEQ ID NO: 5) 

/ genes" megDI I " 
CDS 4651 . . 5775 

/genes "megDII " 

/codon_start=l 

/transl_table=ll 

/produc t = " TDP - 3 - ke to - 6 - deoxyhexose 3 -aminotransaminase " 
/trans lation= n MTTYVWSYLLEYERERADILDAVQKVFASGSLILGQSVENFETE 
YARYHGIAHCVGVDNGTNAVKLALESVGVGRDDEWTVSNTAAPTVLAIDEIGARPVF 
VDVRDEDYLMDTDLVEAAVTPRTKAIVPVHLYGQCVDMTALRELADRRGLKLVEDCAQ 
AHGARRDGRLAGTMSDAAAFSFYPTKVLGAYGDGGAVVTNDDETARALRRLRYYGMEE 
VYYVTRTPGHNSRLDEVQAEILRRKLTRLDAYVAGRRAVAQRYVDGLADLQDSHGLEL 
P WTDGNEHVF YVYWRHPRRDE 1 1 KRLRDGYD ISLNI S YPWP VHTMTGFAHLG VASG 
SLPVTERLAGEIFSLPMYPSLPHDLQDRVIEAVREVITGL" ( SE Q ID N0 : 6 ) 

gene 5822 . .6595 

/gene = "megDI II " 

CDS 5822 . . 6595 

/gene= " megDIII" 
/codon_start=l 
/ trans l_table=ll 

/produc t="daunosaminyl-N / N- dimethyl transferase" 
/translation="MPNSHSTTSSTDVAPYERADIYHDFYHGRGKGYRAEADALVEVA 
RKHTPQAATLLDVACGTGSHLVELADSFREWGVT5LSAA>1I^TAAROTPGRELHQGDM 
RD F S LDRRFDWTCM F S S TG YLVDE AE LDRAVANLAGHL APGGTLVVE P WW F PET FRP 
GWVGADLVTSGDRRISRMSHTVPAGLPDRTASRMTIHYTVGSPEAGIEHFTEVHVNTL 
FARAAYEQAFQRAGLSCSYVGHDLFSPGLFVGVAAEPGR" (SEQ ID NO: 7) 

gene 6592.. 7197 

/gene= "megDIV" 

CDS 6592 . .7197 

/gene= "megDIV" 
/ codon_start=l 
/transl_table=:ll 

/product = "TDP-4 -keto- 6 -deoxyhexose 3 , 5 -epimerase" 
/translation^ "MRVEELGIEGVFTFTPQTFADERGVFGTAYQEDVFVAALGRPLF 
P V AQ VS TTR S RRG WRG VH FTTM PG S MAKYVYC ARGRAMD F AVD I RPG S PT FGRAE P V 
ELSAESMVGLYLPVGMGHLFVSLEDDTTLVYLMSAGYVPDKERAVHPIiDPELALPIPA 
DLDLVMSERDRVAPTLREARDQGILPDYAACRAAAHRWRT" (SEQ ID NO: 8) 

gene 7220. .8206 

/gene= "megDV " 

CDS 7220 . . 8206 

/gene="megDV" 
/ codon_start=l 
/ trans 1 — tableau 

/products M TDP-4 -keto- 6 -deoxyhexose 4 -ketoreductase" 

/ trans la t ion- " MVVIX5ASGFLGS AVTHALADLPVRVRLVARREVVVPSGAVADYE 

THR VDLTEPGALAE WADARAVFP FAAQ IRGTS G WR I S EDD WAERTWVGIjVRDIi IAV 

LSRSPHAPVWFPGSNTQVGRVTAGRVIDGSEQDHPEGVYDRQKHTGEQLIjKEATAAG 

AI RATS LRLP P VFGVP AAGTADDRG WSTM I RRAIiTGQPLTMWHDGTVRRELLYVTDA 

ARAFVTALDHADALAGRHFLLGTGRSWPLGEVFQAVSRSVARHTGEDPVPWSVPPPA 

HMDPSDLRSVEVDPARFTAVTGWRATVTMAEAVDRTVAALAPRRAAAPSEPS'' 

gene complement <8228 .. 9220) (SEQ ID NO: 9) 
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/ gene = " megDvr I n 
complement (8228 . . 9220) 
/gene="megDVTl n 
/codon_start=l 
/transl_table=ll 

/product="TDP-4-keto-6-deoxyhexose 2 , 3 -reductase" 

/trans 1 at ion= M MGTTG AGS ARVRVGRSALHTSRLWLGTVNFSGR VTDDDALRLMD 

KALERGVNC IDTAD I YG WRL YKGHTE ELVGRWFAQGGGRREETVLATKVGS EMS ERVN 

DGGLSARHIVAACEWSLRRIiGVDHIDIYQTHHIDRAAPWDEVWQAAEHLVGSGKVGYV 

GSSNLAGWHIAAAQESAARRNLLGMISHQCLYNLAVRHPELDVLPAAQAYGVGVFAWS 

PLHGGLLSGVLEKIJ^GTAVKSA<^RAQVLLPAVRPLVEAYEDYCRRI^ADPAEVGI^ 

WLSRPGILGAVIGPRTPEQLDSAIiRAAELTLGEEELRELEAIFPAPAVDGPVP" 

complement (9226 .. 10479) (SEQ ID NO: 10) 

/gene= M megBV" 

complement (9226 . . 10479) 

/gene-"megBV" 

/ codon__start = l 

/transl_table=ll 

/product = "TDP-mycarose glycosy 1 trans f erase" 

/translation^' MRVLLTSFAHRTHFQGLVPIiAWALHTAGHDVRVASQPELTDVVV 
GAGLTSVPLGSDHRLFDISPEAAAQVHRYTTDLDFARRGPELRSWEFLHGIEEATSRF 
VFPV^^SFVDELVEFAMDWRPDLVLWEPFTFAGAVAAKACGAAHARLLWGSDLTGY 
FRSRSQDLRGQRPADDRPDPLGGWIiTEVAGRFGLDYSEDIiAVGQWSVDQLPESFRLET 
GLESVHTRTLPYNGSSWPQWLRTSDGVRRVCFTGGYSA1.GITSNPQEFLRTLATLAR 
FDGEIWTRSGLDPASVPDMVRLVDFVPMNILLPGCAAVIHHGGAGSWATAIiHHGVPQ 

ISVAKEWDCVLRGQRTAELGAGVFLRPDEVDADTLWQALATVVEDRSHAENAEKLRQE 
ALAAPTPAEWPVXiEAL AHQHRADR " (SEQ ID NO: 11) 

complement (104 33 11424 ) 
/gene = H rr.egBIV"* 

complement (10483. .11424) 
/ gene="megBIV" 
/codon_start=l 
/transl_table=ll 

/ product = "TDP - 4 - ke to - 6 - deoxyhexos e 4 -ke toreduc t as e 

/ 1 r ans 1 a t ion= " MTRHVTLLGVSGFVGS ALLREFTTHPLRLRAVARTGSRDQPPGS 

AG I EHLRVDLLE PGRVAQ VVADTDVVVHLVAYAAGGSTWRS AATVPEAERVNAGIMRD 

LVAALRARPGPAPVLLFASTTQAANPAAPSRYAQHKI EAERILRQATEDGWDGVILR 

L PA I YGHSG PSGQTGRG WTAMI RRALAGEP ITMWHEGS VRRMLLHVED VATAFTAAL 

H^THEALVGDVWTPSADEARPLGEIFETVAASVARQTGNPAVPWSVPPPENAEANDFR 

SDDFDSTEFRTLTGWHPRVPLAEGIDRTVAAliISTKE " (SEQ ID NO; 12) 

12181. .22821 

/gene= "meg AX" 

12181 . .22821 

/gene="megAI" 

/note="polyketide synthase" 
/ codon_s tart=l 
/ trans l_table=ll 

/product- "megalomicin 6-deoxyerythronolide B synthase 1" 

/trans la t ion= "MVDVPDLl^TRTPHPGPLPFPWPLCGHNEPELRARARQLHAYLE 

G I S EDD WAVGAALARETRAQDG PHRAVWASS VTELTAALAALAQGRPHPS WRGVA 

RPTAPWFVLPGQGAQWPG^TRIiLAESPVFAAAMRACERAFDEVTDWSLTEVL.DSPE 

HLRRVE WQPALFAVQTSLAALWRS FGVRPDAVLGHS IGELAAAE VCGAVDVEAAARA 

AALV7SREOTPLVGRGDKAAVALSPA£LAARVERWDDDWPAGVNGPRSVLIiTGAPEPI 

ARRVAELAAQGVRAQVVWSMAAHSAQVDAVAEGMRSALTWFAPGDSDVPYYAGL 

RLDTRELGADHKPRSFRLPVRFDEATRAVXjELQPGTFIESSPHPVLAASLQQTIjDEVG 

S PAA I VPTLQRDQGG LR RFLLA VAQAYTGG VTVD WTAAY PG VT PGHL PS AVAVETDEG 

PSTEFDWAAPDHVT^RARLLElVGAETAALAGREVDARATFREUSLDSVIjAVQLRTRIiA 

TATGRDLHIAMLYDHPTPHALTELALLRGPQEEPGRGEETAHPTEAEPDEPVAVVAMAC 

RLPGGVTSPEEFWELLAEGRDAVGGIiPTDRGWDLDSLFHPDPTRSGTAHQRAGGFLTG 

ATSFDAAFFGLSPREAIiAVEPQQRITLELSWEVLERAGIPPTSLRTSRTGVFVGLIPQ 

EYGPRLAEGGEGVEGYLMTGTTTSVASGRVAYTLGLEGPAISVDTACSSSLVAVHLAC 

QSLRRGEST^LALAGGVTVMPTPGMLVDFSR^^^SIaAPDGRSKAFSAAAIX3FGMAEGAGM 

LLLERLSDARPJIGHPVI^AVIRGTAVNSDGASNGIiSAPNGRAQVRVIRQAIiAESGIiTPH 

TVDWETHGTGTRLGDPIEARALSDAYGGDREHPLRIGSVKSNXGHTQAAAGVAGLXK 
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LVLAMQAGVl^PRTLHADEPSPEIDWSSGAISLLQEPAAVJPAGERPRRAGVSSFGISGT 

NAHAIIEEAPPTGDDTRPDRMGPWPVA-TjSASTGEALRARAARLAGHLREHPDQDLDD 

VAYSLATGRAALAYRSGFVPADASTALRILDELAAGGSGDAVTGTARAPQRWFVFPG 

QGX^/QWAGMAVDLLDGDPVFASVLRECADALEPYLDFEIVPFLRAEAQRRTPDHTLSTD 

RVD WQP VL FAVMVS LAARWRA YG VE P AAV I GH SQG E I AAAC VAG ALS LDD AARAVAL 

RS RV I ATMPGNG AMAS I AAS VD E VAAR I DGR VE I AAVNG PRA WVSGDRDDLDRLVAS 

CTVEGVRAKRLPVDYASHSSHVEAVRDALHAEI/3EFRPLPGFVPFYSTVTGRWVEPAE 

LD AG YWFRNLRHR VR FAD AVRS L ADQG YTT FLE VS AH P VLTT AI EE IGEDRGGDLVAV 

HS LRRG AGG P VD FG S ALARAFVAG VA.VDV7E S AYQG AG ARRVP LPT YP FQRE R F W LE PN 

PARRVADSDDVSSLRYRIEWHPTDPGEPGRLDGTWLLATYPGRADDRVEAARQALESA 

GARVEDLVVEPRTGRVDLVRRLDAVGPVAGVLCLFAVAEPAAEHSPLAVTSLSDTLDL 

TQ AVAG S GREC P I WVVTENAVAVGP FE RLRD P AHG ALV7ALGRVVAIiEN P AVWGGLVD V 

PSGSVAELSRHLGTTLSGAGEDQVALRPDGTYARRWCRAGAGGTGRWQPRGTVLVTGG 

TGGVGRHVARWLARQGTPCLVLASRRGPDADGVEELLTELADLGTRATVTACDVTDRE 

QLRALIiATVDDEHPLSAVFHVAATLDDGTVETLTGDRIERANRAKVLGARNLHELTRD 

ADLDAFVLFSSSTAAFGAPGLGGYVPGNAYLDGLAQQRRSEGLPATSVAWGTWAGSGM 

AEG P VAD RF RRHG VMEMHPDQ AVEGL R V ALVQG E V AP I WD I RWDR F L LAYTAQRPTR 

LFDTLDEARRAAPGPDAGPGVAAIiAGLPVGEREKAVliDLrVRTHAAAVIjGKASAEQVPV 

DRAFAELGVDSLSALELRNRLTTATGVRIiATTTVFDHPDVRTLAGHLAAELGGGSGRE 

RPGGEAPTVAPTDEPIAIVGMACRLPGGVDSPEQLWELIVSGRDTASAAPGDRSWDPA 

ELMVSDTTGTRTAFGNFMPGAGEFDAAFFGISPREAIiAMDPQQRHALETTWEALENAG 

I RPE S LRGTDTG VF VGMS HQG Y ATG R P KP EDE VDG YLLTGNTAS VASGR I AYVLG LEG 

PAITVDTACSSSLVALHVAAGSLRSGDCGLAVAGGVSVMAGPEVFREFSRQGALAPDG 

RCKPFSDEADGFGLGEGSAFWLQRLSVAVREGRRVLGWVGSAVNQDGASNGLAAPS. 

GVAQQRVIRRAWGRAGVSGGDVGWEAHGTGTRLGDPVELGALLGTYGVGRGGVGPW 

VGSVKANVGHVQAAAGWGVIKVVLGLGRGLVGPMVCRGGLSGLVDWSSGGLVVADGV 

RG WP VGVDG VRRGG VS AFG VSGTNAHVWAE APG S WGAERP VEG S S RG LVGWGGW 

PWLSAKTETALHAQARRLADHLETHPDVPMTDVVWTLTQARQRFDRRAVLLAADRTQ 

AVERLRGLAGGEPGTGWSGVASGGGWFVFPGQGGQWGMARGLLSVPVFVESWEC 

DAWSSWGFSVLGVLEGRSGAPSLDRVDVVQPVLF\A/iyrVSLJ^LWRWCGWPAAVVG 

HSQGEIAAAVVAGVXiSVGDGARVVALRARALRALAGHGGMASVRRGRDDVQKLLDSGP 

VJTGKLEIAAVNGPDAVVVSGDPRAVTELVEHCDGIGVRARTIPVDYASHSAQVESLRE 

ELLS VLAG I EGRP ATVPF YSTLTGGFVDGTELDADYWYRNLRHP VRFHAAVEALAARD 

LTTFVEVSPHPVLSMAVGETLADVESAVTVGTLERDTDDVERFLTSLAEAHVHGVPVD 

WAAVLGSGTLVDLPTYPFQGRRFWLHPDRGPRDDVADWFHRVDWTATATDGSARLDGR 

WLVWPEGYTDDGWWEVRAALAAGGAEPWTTVEEVTDRVGDSDAWSMLGLADDGA 

AETLALLRRLDAQASTTPLWWTVGAVAPAGPVQRPEQATVWGLALVASLERGHRWTG 

LLDLPQTPDPQLRPRLVEALAGAEDQVAVRADAVHARRIVPTPVTGAGPYTAPGGTIL 

VTGGTAGLGAVTARWLAERGAEHLALVSRRGPGTAGVDEWRDLTGLGVRVSVHSCDV 

GDRESVGALVQELTAAGDVVRGVVHAAGLPQQVPLTDMDPADIxADVVAVKVDGAVHIiA 
DLCPEAELFLLFSSGAGVWGSARQGAYAAGNAFLDAFARHRRDRGLPATSVAWGLWAA 
GGMTGDQEAVSFLRERGVRPMSVPRALEALERVLTAGETAVWADVDWAAFAESYTSA 
RPRPLLHRLVTPAAAVGERDEPREQTLRDRLAALPRAERSAELVRLVRRDAAAVLGSD 
AKAVPATTP F KDLG FD S L AAVRFRNRLAAHTG LRLP ATLVFEHPNAAAVAD LLHDRLG 
EAGEPTPVRSVGAGLAALEQALPDASDTERVELVERLERMLAGLRPEAGAGADAPTAG 
DDLGEAGVDELLDALERELDAR" (SEQ ID NO: 13) 

misc_f eature 12505.. 13470 

/gene= "megAI " 

/function="AT-L" 
misc_f eature 13576. .13791 

/gene="megAI " 

/ function^ "ACP-L" 
misc_f eature 13849.. 15126 

/ gene- "megAI " 

/function= "KS1" 
misc_f eature 15427. .16476 

/gene="megAI" 

/functions ATI" 
misc_f eature 17155 . - 17694 

/gene= "megAI " 

/function= n KRl" 
misc_f eature 17947.. 18207 

/ gene= "megAI " 

/ f unct ion= " ACPI » 
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/genes 1 ' megAI 

/function«'»KS2" 

19876. .20910 

/gene- "megAI" 

/f unction= ,, AT2" 

21S17.. 22053 

/gene= " megAl" 

/function^ "KR2 " 

22318 . .2257S 

/gene= "megAI " 

/function- ,, ACP2" 

22867. .33555 

/gene= n megAII" 

22867. .33555 

/gene= " megAI i " 

/note="polyketide synthase" 

/codon_start=l 

/trans l_table= 11 

/product="megalomicin 6-deoxyerythronolide B synthase 2" 

/ trans lation= '* MTDOTKVAEYLPJ^TLDLRAARKRLRELQSDPIAVVGNIACRLPG 

GVHLPQHLVTOLLRQGHETVSTFPTGRGWDLAGLFHPDPDHPGTSYVDRGGFLDDVAGF 

DAEFFG I S PRE ATAMDPQQRLLLETS WELVES AG IDPHSLRGTPTG VFLGVARLGYGE 

NGTEAGDAEGYSVTGVAPAVASGRI S YALGLEGPS I SVDTACSS SLVALHLAVESLRL 

GESS LA WGGAAVMATPG VF VD FS RQRAL AADGRS KAFG AAADG FGFSEG VSL VLLER 

LSEAESNGHEVLAVIRGSALNQDGASNGLAAPMGTAQRKVIRQALRNCGLTPADVDAV 

EAHGTGTTLGDPIEANALLDTYGRDRDPDHPLWLGSVKSNIGHTQAAAGVTGLLKMVL 

ALRHEELPATLHVDEPTPHVDWSSGAVRLATRGRPWRRGDRPRRAGVSAFGISGTNAH 

VIVEEAPERTTERTVGGDVGPVPLWSARSAAALRAQAAQVAELVEGSDVGLAEVGRS 

LAVTRARHEHRAAWASTRAEAVRGLREVAAVEPRGEDTVTGVAETSGRTWFLFPGQ 

GSQWVGMGAELLDSAPAFADTIRACDEAMAPLQDWSVSDVLRQEPGAPGLDRVDVVQP 

VL F AVMVS LAE.LW Q S YG VT P AAWGHSQGE I AAAHVAGAL S LADAARL WGRS RLLRS 

LSGGGGMSAVALGEAEVRRRLRSWEDRISVAAVNGPRSWVAGEPEALREWGREREAE 

GVRVREIDVDYASHSPQIDRVRDELLTVTGEIEPRSAEITFYSTVDVRAVDGTDLDAG 

YWYRNLRETVRFADAMTRLADSGYDAFVEVSPHPVWSAVAEAVEEAGTODAVVVGTL 

SRGDGGPGAFLRSAATAHCAGVDVDWTPALPGAATIPLPTYPFQRKPYWLRSSAPAPA 

SHDl J AYRVSWTPITPPGDGVI J DGDWL\AmPGGSTGWVDGLAAAITAGGGRWAHPVDS 

VTS RTG LAEALARRDGTFP.GVIjSWVATDERHVEAGAVALLTLAQAIjG DAG ID APLWCL 

TQEAVT^TPVDGDLARPAQAALHGFAQVARLEIiARRFGGVLDLPATVDAAGTRLVAAVL 

AGGGEDWAWGDRLYGRPXVRATLPPPGGGFTPHGTVLVTGAAGPVGGRL^WLAER 

GATRIiVT^PGAHPGEELLTAIRAAGATAVVCEPEAEALRTAIGGELPTALVHAETLTOF 

AGVADADPEDFAATVAAKTALPTVLAEVLGDHRLEREVYCSSVAGVWGGVGMAAYAAG 

SAYLDALVEHRRARGHASASVAWTPWAIiPGAVDDGRLRERGLRSLDVADALGTWERLL 

RAGAVSVAVADVDWSVFTEGFAAIRPTPLFDELLDRRGDPDGAPVDRPGEPAGEWGRR 

IAAIiSPQEQRETLLTLVGETVAEV1jGKETGTEINTRRAFSEL.GLDSLGSMALRQRLiAA 

RTGLRMPASLVFDHPTVTALARYLRRLWGDSDPTPVRVFGPTDEAEPVAWGIGCRF 

PGGIATPEDLWRWSEGTSITTGFPTDRGWDLRRLYHPDPDHPGTSYVDRGGFLDGAP 

DFDPGFFGITPREAIaAMDPQQRLTLEIAWEAVERAGIDPETLLGSDTGVFVGMNGQSY 

LQLLTGEGDRLNGYQGLGNSASVLSGRVAYTFGWEGPALTVDTACSSSLVAIHLAMQS 

LRRGECSLAIAGGVTVMADPYTFVDFSAQRGLAADGRCKAFSAQA^ 

LEPLSKARRNGHQVLAVLRGSAVNQDGASNGLAAPNGPSQERV1RQALTASGLRPADV 

DMVE AHGTGTE LGD P I E AG AL I AAYGRDRDRPLW LGS VKTN I GHTQAAAGAAGVIKAV 

LAMRHGVLPRS LHADELS PH ID W ADG KVEVLRE ARQWP PGERPRRAG VS S FGVSGTNA 

HVIVEEAPAEPDPEPVPAAPGGPLPFVLHGRSVQTVRSOARTLAEHLRTTGHRDLADT 

ARTIxATGRARFDVRAAVIiGTDREGVCAALDALAQDRPSPDVVAPAVFAARTPVIiVFPG 

QGSQWGMARDLIiDSSEVFAESMGRCAEALSPYTDWDLIiDVVRGVGDPDPYDRVDVLQ 

P VLF AVMVS LARLWQS YGVT PG AWGHSQGE I AAAHVAG ALS LAD AARWALRS RVLR 

ELDDQGGMVS VGT S RAELD S VLRRWDGRVAVAAVNG PGTL WAG PTAELDE FLAVAEA 

REMRPRRIAVRYASHSPEVARVEQRLAAELGTVTAVGGTVPLYSTATGDLLDTTAMDA 

GYWYRNLRQPVLFEHAVRSLLERGFETFIEVSPHPVLLMAVEETAEDAERPVTGVPTL 

RRDHDGPSEFLRNLLGAHVHGVDVDLRPAVAHGRLVDLPTYPFDRQRLWPKPHRRADT 

SSLGVRDSTHPLLHAAVDVPGHGGAVFTGRLSPDEQQWLTQHWGGRNLVPGSVLVDL 

ALT AGADVGVP VLEELVLQQ PLVLTAAGALLRLS VGAADEDGRRPVE IHAAED VSDPA 

EARWSAYATGTLAVGVAGGGRDGTQWPPPGATALTLTDHYDTLAELGYEYGPAFOALR 
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AAWQHGD WYAEVS LD AVEEG Y AFD P VLLD AVAQT FGLTS RAPGKLP F AWRGVTLHAT 
GATAVR WATPAG PDAVALR VTDPTGQL VATVDALiVVRDAGADRDQ PRG RDG DLHRL E 
WVRLATPDPTPAAWHVAADGLDDLLRAGGPAPQAWVRYRPDGDDPTAEARHGVLWA 
ATLVRRWLDDDRWPATTLWATSAGVEVSPGDDVPRPGAAAVWGVLRCAQAESPDRFV 
LVDGDPETPPAVPDNPQLAVRDGAVFVPRLTPLAGPVPAVADRAYRLVPGNGGSIEAV 
AFAPVPDADRPIAPEEWVAVRATGWFRDVLLALGMYPEPAEMGTEASGVVTEVGSG 
VRR FTPGQ AVTG LFQGAFGP VAVADHRLLTP VPDGWRAVDAAAVPI AFTT AH YALHDL 
AGLQAGQSVLVHAAAGGVGKAAVAIiARRAGAEVFATASPAKHPTLRAIiGLDDDHIASS 
RESGFGERFAARTGGRGVDWLNSLTGDLLDESARLLADGGVFVEMGKTDIiRPAEQFR 
GRYVPFDLAEAGPDRLGEILEEWGLLAAGALDRLPVSVWELSAAPAALTHMSRGRHV 
GKLVLTQPAPVTIPbGTVLVTGGTGTLGRLVARHLVTGHGVPHLLVASRRGPAAPGAAE 
LRADVEGLGAT I E I VACDTADREALAAI/LDS I P ADRPLTGWHTAGVLADGLVTS I DG 
TATDQVLRAKVDAAWHLHDLTRDADLS FFVLFS S AAS VLAGPGQGVYAAANGVLNAJLA 
GQRRALGLP AKALGWGLW AQ AS EMTSGLGDRI ARTGVAALPTERAL ALFD AALRSGGE 
VliFPLSVDRSAIiRRAEYVPEVLRGAVRSTPRAANRAETPGRGIiLDRLVGAPETDQVAA 
LAELVRSHAAAVAGYDSADQLPERKAFKDLGFDSLAAVELRNRLGVTTGVRLPSTLV^ 
DHPTPLAVAEHLRSELFADSAPDVGVGARLDDLERALDALPDAQGHADVGARLEALIiR 
RWQSRRPPETE P VT I SDDASDDELFS MLDRRLGGGGD V " (SEQ ID NO: 14) 
misc__feature 22957. .24237 

/genes" megAI I " 
/function="KS3" 
misc_feature 24544.. 25581 

/ gene = " megAI I " 
/function="AT3 M 
misc_feature 26230. .26733 

/ g e ne = " megAI I " 
/ functions "KR3 (inactive)" 
misc_f eature 26998.. 27258 

/gene="megAII " 
/function="ACP3 n 
misc_f eature 27393 . . 28590 

/ ge ne = " megAI I " 
/function="KS4" 
misc_f eature 28897. .29931 

/ gene = " megAI I " 
/function="AT4 " 
miscjeature 29953 . .30477 

/gene =" megAI I M 
/function= ,, DH4" 
misc__f eature 31396 . . 32244 

/gene = " megAI 1 11 
/function^ "ER4 M 
misc_f eature 32257 . . 32799 

/gene="megAII " 
/function= M KR4 " 
misc_f eature 33052.. 33312 

/gene="megAII" 
/function= M ACP4 " 
gene 33666.. 43271 

/ g e ne = " megAI II" 
CDS 33666.. 43271 

/gene= "megAI II" 
/note= n polyketide synthase" 
/ codon__s t ar t = 1 
/transl_table=ll 

/product="megalomicin 6-decocyerythronolide B synthase 3" 

/ 1 rans la t ion= "MSESSGMTEDRLRRYLKRTVAELDS VTGRLDEVE YRAREPIAW 

GMACRFPGGVDSPEAFWEFIRDGGDAIAEAPTDRGWPPAPRPRIiGGLIiAEPGAFDAAF. 

FGISPREALATDPQQRLMLEISWEALERAGFDPSSLRGSAGGVFTGVGAVDYGPRPDE 

APEEVIiGYVGIGTASSVASGRVAYTIfGLEGPAVTVDTACSSGLTAVHIiAMESLRRDEC 

TLVI^GGVTVWSSPGAFTEFRSQGGLAEDGRCKPFSRAAIXjFGliAEGAGv^ 

ARAEGRP\n^VLRGSAINQDGASNGLTAPSGPAQRRVIRQALERARLRPVDVDYVEAH 

GTGTRIX3DPIEAHAIiLDTYGADREPGRPLWGSVKSWIGHTQAAAGVAGVMKTVlJUiR 

HREIPATIiHFDEPSPHVDWDRGAVSWSETRPWPVGERPRRAGVSSFGISGXNAHVIV 
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EEAPSPQAADLDPTPGPATGATPGTDAAPTAEPGAEAVALVFSARDERALRAQAARLA 
DRLTDDPAPSLRDTAFTLVTRRATWEKRAWVGGGEEVLAGLRAVAGGRPVDGAVSGR 
ARAGRRWLVFPGQGAQWQGMARDLLRQS PTFAES IDACERALAPHVDWSLREVLDGE 
QSLDPVDyVOPVLFAVl'TVSLARLWQSYGVTPGAWGHSQGEIAAAHVAGAIiSliADAAR 
VVALRSRVLRRLGGHGGf'lAS FGLHPDQAAERIARFAGALrTVASVNGPRSWLAGENGP 
LDELIAECEAEGVTARRIPVDYASHSPQVESLREELLAALAGVRPVSAGIPLYSTLTG 
QVIETA^lDADYWFANLREPVRFQDATRQLAEAGFDAFVEVSPHPVLTVGVEATLiEAV 
LPPDADPCVTGTLRRERGGLAQFHTAJUAEAYTRGVEVDWRTAVGEGRPVDLPVYPFQR 
QNFWLPVPLGRVPDTGDEWRYQLAWHPVDLGRSSLAGRVLVVTGAAVPPAWTDVVRDG 
LEQRG ATWLCT AQ S RAR I G AALDAVDGTAL, ST WS LLALAEGG AVDD PS LDTLALVQ 
ALG AAG ID VPLWLVTRD AAAVTVGDD VD PAQAMVGGLGR WG VES P AR WGGLVDLRE A 

DADSARSLAAILADPRGEEQFAIRPDGVTVARLVPAPARAAGTRWTPRGTVLVTGGTG 
G I G AHLARWLAG AG AEH LVXiLNRRG AE AAGAADLRDE LV ALGTG VT I T ACD V ADRDRL 

AAVLD AARAQGR WT AVFHAAG I S RSTAVQE LTESEFTEI TD AKVRGTANLAELC PE L. 
D ALVL F S S MAAWIG S PG LAS Y AAGNAF LD AFARRGRRSGL P VTS I AWG LWAGQNMAGT 
EGGDYLRSQGLRAMDPORAIEELRTTLDAGDPWSWDLDRERFVELFTAARRRPLFD 
EUSGVRAGAEETGQESDLARRLASMPEAERHEHVARLVRAEVAAVLGHGTPTVIERDV 
AFRDLGFDSMTAVDLPJJRIJ^VTGVRVATTIVFDHPTVDRLTAHYLERIiVGEPEATTP 
AAAWPQAPGEADEPIAIVGMA.CRLAGGVRTPDQLWDFIVADGDAVTEMPSDRSWDLD 
AIjFDPDPERHGTSYSRHGAFLDGAADFDAAFFGISPREALAMDPQQRQVLiETTWELFE 
NAGIDPHSLRGTDTGVFLGAAYQGYGQNAQVPKESEGYLLTGGSSAVASGRIAYVLGL 
EGPAITVDTACSSSLVA-LHVAAGSLRSGDCGLAVAGGVSVMAGPEVFTEFSRQGAXiAP 
DGRCKPFSDOADGFGFAEGVAWLLQRLSVAVREGRRVXiGVWGSAVNQDGASNGLAA 
PSGVAQQRVIRRAWGRAGVSGGDVGWEAHGTGTRLGDPVELGALLGTYGVGRGGVGP 
VWGSVKANVGHVQAA^.GWGVIKVVI/5LGRGLVGPMVCRGGLSGLVDWSSGGLVVAD 
GVRGWPVGVDGVRRGGVSAFGVSGTNAHVWAEAPGSWGAERPVEGSSRGLVGVAGG 
WPVVLSAKTETAI/TEUARRLHDAVDDIVAL^ 

LtRDRLRAFTTGSAAPGWSGVASGGGWFVFPGQGGQWVGMARGLLSVPVFVESWEC 
DAWSSVVGFSVLGVLEGRSGAPSLDRVDWQPVLFVVMVSLARLWRWCGVVPAAVVG 
HSQGEIAAAWAGVLSVGDGARWALRARALRALAGHGGMVSLAVSAERARELIAPWS 
DRISVAAVNSPTSVWSGDPQAI^UU^VAHCAETGERAKTLPVDYASHSAHVEQIRDTI 
LTDLADVTAJiRPDVALYSTL.HGARGAGTDMDARYWYDNXjRSPVRFDEAVEAAVADGYR 
VFVEMSPHPVLTAAVQEIDDETVAIGSLHRDTGERHLVAELARAHVHGVPVDWRAILP 
ATHPVPLPNYPFEATRYWUAPTAADQVADHRYRVDVJRPLATTPAELSGSYLVFGDAPE 
TLGHSVEKAGGLLVPVAAPDRESLAVALDEAAGRIAG\TLSFAADTA.THLARHRX»IjGEA 
DVEAPLWLVTSGGVALDDHDPIDCDQAI4WGIGRWIGLETPHRWGGLVDVTVEPTAED 

gwfaallajuddhedqvalrdgirhgrrlvraplttrnarwtpagtalvtggtgalgg 

HVARYLARSGVTDLVLLSRSGPDAPGAAELAAELADLGAEPRVEACDVTDGPRIjRAIiV 

qelreqdrpvrivvkta.gvpdsrpldridelesvsaakvtgarlldelcpdadtfvlf 
s sg agwgs aniigayaaanayldalahrrrqagraats vawgawagdgmatgd ldglt % 

RRGLRAMAPDRALRACTRRWTTHDTCVS VADVDWDRFAVGFTAARPRPIjIDELVTSAP x 

VAAPTAAAAPVPAMTADQLLQFTRSHVAAILGHQDPDAVGLDQPFTEIiGFDSIjTAVGI* 

RNQLQQATGRTLPAALVFQHPTVRRLADHLAQQLDVGTAPVEATGSVLRDGYRRAGQT 

GDVRSYLDLLANLSEFRERFTDAASLGGQLELVDIxADGSGPVTVICCAGTAALSGPHE 

FARLASALRGTVPVPJUAQPGYEAGEPVPASMEAVLGVQADAVLAAQGDTPFVLVGHS 

AGALMAYALATELADRGHPPRGVVLLDVYPPGHQEA\raAWLGELTAAIiFDHETVRMDD 

TRLTALGAYDRLTGRWRPRDTGLPTLWAASEPMGEWPDDGWQSTWPFGHDRVTVPGD 

HFSMVQEHADAIARKIDAWLSGERA" (SEQ ID NO: 15) 

misc_feature 33780.-35027 

/gene= "megAII I " 
/function^KSS 1 ' 

misc_feature 35385. .36419 

/gene= n megAIII " 
/functions"AT5" 

misc_f eature 37068. .37604 

/gene= w megAIII n 
/functions" KR5 n 

misc_f eature 37860. .38120 

/gene= " meg AI II" 
/ f unction= M ACP5 " 

misc_f eature 38187 . .39470 

/gene="megAIII° 
/function= ,, KS6 M 

misc feature 39795. . 40811 
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/ ge ne = '* megAI 1 1 " 

/ f unct ion«= 11 AT 6 " 
misc — feature 41406. .41936 

/gene= "megAI II " 

/function="KR6 " 
misc_feature 42168.. 42425 

/ gene = " megAI II" 

/ f unc t ion= " ACP6 M 
misc_feature 42585.. 43271 

/gene= "megAI II" 

/f unction= M TE" 
gene 43268 . .44344 

/gene= " megCI I " 
CDS 43268 . .44344 

/gene="megCII" 

/ codon_s t a r t = 1 

/transl table=ll 

/product = M TDP-4 -Jceto-6-deoxyglucose 3 , 4 -isomerase" 

/translations" MNTTDRAVLfGRRLQMIRGLYWGYGSNGDPYPMLLCGHDDDPHRW 

YRGLGGSGVRRSRTETWVVTDHATAVRVLDDPTFTRATGRTPEWMRAAGAPASTWAQP 

FRDVHAASWDAELPDPQEVEDRLTGIiLPAPGTRIiDLVRDLAWPMASRGVGADDPDVLR 

AAWDARVGLDAQLTPQPI^VTEAAIAAVPGDPHRRALFTAVEMTATAFVDAVLAVTAT 

AGAAQRLADDPDVAARLVAEVIjRIjHPTAHLERRTAGTETWGEHTVAAGDEVVVVVAA 

AMRDAGVFADPDRIiDPDRADADRALSAQRGHPGRIiEEIiVWLTTAALRSVAKALPGLT 
AGGPWRRRRSPVLRATAHCPVEL" (SEQ ID NO: 16) 

gene 44355 . .45623 

/gene = "megCIII " 
CDS 44355 . .45623 

/gene= "megCIII" 

/codon_start=l 

/ trans l_table=ll 

/product = "TDP-desosamine glycosyltransf erase" 
/translations" MRWFS S MAS KSHLFGLVPLAWAFRAAGHE VR WAS PALTDD IT 
AAGLTAVPVGTDVDLVDFMTHAGYDIIDYVRSLDFSERDPATSTWDHLLGMQTVL.TPT 
FYALMSPDSLVEGMISFCRSWRPDWSSGPQTFAASIAATVTGVAHARLLWGPDITVRA 
RQKFLGLLPGQPAAHREDPLAEWLTWSVERFGGRVPQDVEELWGQWTIDPAPVGMRL 
DTGLRTVGMRYVDYNGPSWPDWLHDEPTRRRVCLTLGISSRENSIGQVSVDDLLGAL 
GDVDAE 1 1 ATVDEQQLEGVAHVPANIRTVGFVPMHALLPTCAATVHHGGPGSWHTAAI 
HGVPQVILPDGWDTGVRAQRTEDQGAGIALPVPELTSDQLREAVRRVLDDPAFTAGAA 
RMRADMLAEPSPAEWDVCAGLVGERTAVG" (SEQ ID NO: 17) 

gene 45620 . .46591 

/gene= r 'megBII" 

CDS 45620 . .46591 

/gene= " megBI I M 
/codon__start=l 
/trans l_table=ll 

/products "TDP -4 -keto-6-deoxyglucose 2,3 dehydratase" 

/translation^" MSTDATHVRLGRCALLTSRLWLGTAALAGQDDADAVRLLDHARS 

RGVNCLDTADDDSASTSAQVAEESVGRWLAGDTGRREETVLSVTVGVPPGGQVGGGGL 

SARQIIASCEGSLRRLGVDHVDVLHLPRVDRVEPWDEVWQAVDAIjVAAGKVCYVGSSG 

FPGWHIVAAQEHAVRRHRLGLVSHQCRYDLTSRHPELEVIiPAAQAYGLGVFARPTRLG 

GLLGGDGPGAAAARASGQPTALRSA.VEAYEVFCRDLGEHPAEVALAWVL.SRPGVAGAV 

VG ARTPGRLDS ALRACG VAI/G ATELTAIiDG I FPG VAAAGAAPEAWLR " (SEQ ID NO: 18) 

gene complement (46660 47403) 

/gene= " megH" 

CDS complement (46660 47403) 

/gene =" megH" 

/note="putat ive thioesterase" 
/ codon^s t art = 1 
/ trans l_table« 11 
/products "TEII " 

/translations M ^WTWLRRFGSADGHRAJUJYCFPHAGAAADSYLDIxAJUUJAPEVDV 
WAVQYPGRQDRRDERALGTAGEIADEVAAVLRDLVGEVPFALFGHSMGALVAYETARR 
LEARPGVRPLRLFVSGQTAPRVHERRTDLPDEDGLVEQMRRIX3VSEAAIADQGIiI/DMS 
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LPVLRADHRVLRSYAWQA.GPPLRAGITTLCGDTDPLTTVEDAQRWLPYSWPGRTRTF 

PGGHFYLAJDHVGEVAESVAPDLIjRLTPTG" (SEQ ID NO: 19) 
gene complement (474 11 >4 798 1 ) 

/gene="megF" 
CDS complement (47411 >47980) 

/gene="megF u 

/codon_ start =1 

/translatable =11 

/product = r, C- 6 hydroxylase" 

/translation="IRVQDDDADRLSRDELTSIALVLLLAGFEASVSLlGIGTYIiIiLT 
HPDQLALVRKDPALIiPGAVEEIIiRYQAPPETTTRFATAEVEIGGVTIPAYSTVLIANG 
AANRDPGQFPDPDRFDVTRDSRGHLTFGHGIHYCMGRPLAKLEGEVAIiGALFDRFPKL 
S LG F PS DEWWRRSLLLRG I DHLPVR PNG " (SEQ ID NO: 20) 

BASE COUNT 5962 a 1687S c 18045 g 7099 t 

ORIGIN 

1 ctcgagccga tgctcggcgg cgcggtgggc caaccagtcg tggacgtcgt cggtggcggt 
61 gggaggtccg ccgtgccgag tcaggaaacg tattgccgat tgtgtggatt ccggagtcgc 
121 atgaccgttg acccgatccc ccatacgcct ctcccgtgat gtcgtgggcg gtccgtgcgg 
181 taccgcccgg actgacattc gtcgatcaag accccgccca gtgtagggct ccgcccgcga 
241 cgggagaagg tccgtcgaac aacttccggg tgaccggtcg ccggcgtcgg tgaaacgggc 
301 gtcggagcac ccgatcattg ctgtcggtga acttcctaac tgtcggcgcg cacatctttc 
361 tgaccggtgt gttccgtggt atgacgcgtt cccggcccgt ctggaactgt gcgtgggact 
421 gaccggttgc ggcgtgtttt cgcccgtttc cgaactgcgg attcgtcgat cgcgcaggtg 
481 ggagcgggtg gctgaccggg atgatctgca atcatggcgc tcaatgacga tctcttgtag 
541 catggtccgc gccgagggtc cgacaggccc gaaacgcccg gcatccagcc - tgttcgacga 
601 cgtcgacatc accgtgcaag ccgcgatgac accgacacca cgccatgctg gtgccgcact 
661 ggaagggtgg cgcgatcagg gaaatggccg tgtcactaga cagacgccaa acagctgtcc 
721 gggcctgcgg aaacagcatc gatctgcgtc agccgttcat tgccccggcg gcaccgcctt 
781 ggaaatccgt gccaccggtc gtccgcagtg acgatcgcgg acccgggttt cgagacagca 
841 ggtagtaggc gatgcaggcg tttcgtctcg cgccggacgc gtcgcactag gtggaatccg 
901 tcacagtctt caatccggga gcgttctatg gcagttggcg atcgaaggcg gctgggccgg 
961 gagttgcaga tggcccgggg tctctactgg gggttcggtg ccaacggcga tctgtactcg 
1021 atgctcctgt ccggacggga cgacgacccc tggacctggt acgaacggtt gcgggccgcc 
1081 ggacggggac cgtacgccag tcgggccgga acgtgggtgg tcggtgacca ccggaccgcc 
1141 gccgaggtgc tcgccgatcc gggcttcacc cacggcccgc ccgacgctgc ccggtggatg 
1201 caggtggccc actgcccggc ggcctcctgg gccggcccct tccgggagtt ctacgcccgc 
1261 accgaggacg cggcgtcggt gacagtggac gccgactggc tccagcagcg gtgcgccagg 
1321 ctggtgaccg agctggggtc gcgcttcgat ctcgtgaacg acttcgcccg ggaggtcccg 
1381 gtgctggcgc tcggtaccgc gcccgcactc aagggcgtgg accccgaccg tctccggtcc 
1441 tggacctcgg cgacccgggt atgcctggac gcccaggtca gcccgcaaca gctcgcggtg 
1501 accgaacagg cgctgaccgc cctcgacgag atcgacgcgg tcaccggcgg tcgggacgcc 
1561 gcggtgctgg tgggggtggt ggcggagctg gcggccaaca cggtgggcaa cgccgtcctg 
1621 gccgtcaccg agcttcccga actggcggca cgacttgccg acgacccgga gaccgcgacc 
1681 cgtgtggtga cggaggtgtc gcggacgagt cccggcgtcc acctggaacg ccgcaccgcc 
1741 gcgtcggacc gccgggtggg cggggtcgac gtcccgaccg gtggcgaggt gacagtggtc 
1801 gtcgccgcgg cgaaccgtga tcccgaggtc ttcaccgatc ccgaccggtt cgacgtggac 
1861 cgtggcggcg acgccgagat cctgtcgtcc cggcccggct cgccccgcac cgacctcgac 
1921 gccctggtgg ccaccctggc cacggcggcg ctgcgggccg ccgcgccggt gttgccccgg 
1981 ctgtcccgtt ccgggccggt gatcagacga cgtcggtcac ccgtcgcccg tggtctcagc 
2041 cgttgcccgg tcgagctgta gaggaagaac gatgcgcgtc gtgttttcat cgatggctgt 
2101 caacagccat ctgttcgggc tggtcccgct cgcaagcgcc ttccaggcgg ccggacacga 
2161 ggtacgggtc gtcgcctcgc cggccctgac cgacgacgtc accggtgccg gtctgaccgc 
2221 cgtgcccgtc ggtgacgacg tggaacttgt ggagtggcac gcccacgcgg gccaggacat 
2281 cgtcgagtac atgcggaccc tcgactgggt cgaccagagc cacaccacca tgtcctggga 
2341 cgacctcctg ggcatgcaga ccaccttcac cccgaccttc ttcgccctga tgagccccga 
2401 ctcgctcatc gacgggatgg tcgagttctg ccgctcctgg cgtcccgact ggatcgtctg 
24 61 ggagccgctg accttcgccg ccccgatcgc ggcccgggtc accggaaccc cgcacgcccg 
2521 gatgctgtgg ggtccggacg tcgccacccg ggcccggcag agcttcctgc gactgctggc 
2581 ccaccaggag gtggagcacc gggaggatcc gctggccgag tggttcgact ggacgctgcg 
2 641 gcgcttcggc gacgacccgc acctgagctt cgacgaggaa ctggtgctgg ggcagtggac 
2701 cgtggacccc atccccgagc cgctgcggat cgacaccggc gtccggacgg tgggcatgcg 
2761 gtacgtcccc tacaacggcc cctcggtggt gcccgcctgg ctgttgcggg aacccgaacg 
2 821 tcggcgggtc tgcctgaccc tcggcggttc cagccgggaa cacggcatcg ggcaggtctc 
2 881 catcggcgag atgttggacg ccatcgccga catcgacgcc gagttcgtgg ccaccttcga 
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2 941 cgaccagcag ttggtcggcg tgggcagcgt tccggcaaac gtccgtaccg ccgggttcgt 
3001 gccgatgaac gtcctgctgc ccacctgcgc ggccaccgtg caccacggcg gcaccggcag 
3061 ttggctgacc gccgccatcc acggcgtacc gcagatcatc ctctcggacg ccgacaccga 
3121 ggtgcacgcc aagcagctcc aggacctcgg cgcggggctg tcgctcccgg tcgcggggat 
3181 gaccgccgag cacctgcgtg gggcgatcga gcgggttctc gacgagccgg cgtaccgccc 
3241 cggtgcggag cggatgcggg acgggatgcg gaccgacccg tcgccggccc aggtggtcgg 
3301 catctgtcag gacctggccg ccgaccgggc ggcacgcggc aggcagccgc gtcgaaccgc 
3361 cgagccgcac ctgccgcgat gacttccacc accaccggga ccggctgatg ccggtcccgg 
3421 aatccacacg ccgactttcc ttctgacacg agggggcccc ggtggttacc tccaccaact 
3481 tggacacgac agcacggccg gcactgaact cgttgaccgg gatgcggttc gtcgccgcct 
3541 tcctggtctt cttcacgcac gtcctgtcga ggctcatccc gaacagctac gtgtacgccg 
3601 acggcctgga cgccttctgg cagaccaccg gacgggtggg ggtgtcgttc ttctttattc 
3661 tcagcggttt cgtgctgacc tggtcggcgc gggccagcga ctcggtgtgg tcgttctggc 
3721 gcagacgggt ctgcaagctc ttccccaacc acctggtcac cgccttcgcc gccgtggtgt 
3781 tgttcctggt caccgggcag gcggtgagcg gtgaggcgct gatcccgaac ctcctgctga 
3841 tccacgcctg gttcccggcc ctggagatct ccttcggcat caacccggtg agctggtcgt 
3901 tggcctgcga ggcgttcttc tacctgtgct tcccgctgtt cctgttctgg atctccggta 
3961 tccgcccgga gcggctgtgg gcctgggccg ccgtggtgtt cgccgcgatc tgggcggtac 
4021 cggtggtcgc cgacctcctg ctgccgagtt ccccgccgct gatcccgggg cttgagtact 
4081 ccgccatcca ggactggttc ctctacacct tccctgcgac gcggagcctg gagttcatcc 
4141 tcgggatcat cctggcccgc atcctgatca ccggtcggtg gatcaacgtc gggctgctcc 
4201 ccgcggtgct gttgttcccg gtcttcttcg tcgcctcgct cctcctgccg ggtgtctacg 
4261 ccatctcctc gtcgatgatg atccttcccc tggttctgat catcgccagc ggcgcgacgg 
4321 jccgacctcca gcagaagcgc accttcatgc gtaaccgggt gatggtgtgg ctcggcgacg 
4381 tctccttcgc gctctacatg gtccacttcc tggtgatcgt ctacggggcg gacctgctgg 
4441 ggttcagcca gaccgaggac gccccgctgg gtctcgcact cttcatgatc attccgttcc 
4501 tcgcggtctc cctggtgctg tcgtggctgc tgtacaggtt cgtcgagcta cccgtcatgc 
4561 gtaactgggc ccgcccggcc tccgcccggc gcaaacccgc cacggaaccc gaacagaccc 
4621 cttcccgccg gtaagaagga cggtgcatcg gtgaccacct acgtctggtc ctatctgttg 
4681 gagtacgaga gggaacgagc cgacatcctc gatgcggtgc agaaggtctt cgccagtggc 
4741 agcctgatcc tcggtcagag tgtggagaac ttcgagaccg agtacgcccg ctaccacggg 
4801 atcgcgcact gcgtgggcgt cgacaacggc accaacgctg tgaaactcgc gctggagtcg 
4861 gtaggtgtcg gacgcgacga cgaggtcgtc acggtctcca acaccgccgc ccccacagtc 
4 921 ctggccatcg acgagatcgg cgcccggccg gtcttcgtgg acgtccgcga cgaggactac 
4981 ctcatggaca ccgacctggt ggaggcggcg gtcaccccgc gtaccaaggc catcgtcccg 
5041 gtgcacctgt acgggcagtg cgtggacatg acagccctgc gggaactggc cgaccggcgg 
5101 ggcctcaagc tcgtggagga ctgcgcccag gcccacggtg cccggcggga cggtcggctg 
5161 gccgggacga tgagcgacgc ggcggccttc tcgttctacc cgacgaaggt cctcggcgcc 
5221 tacggcgacg gcggcgcggt cgtcaccaac gacgacgaga cagcccgcgc cctgcgacgg 
52 81 ctgcggtact acgggatgga ggaggtctac tacgtcaccc ggaccccggg tcacaacagc 
5341 cgcctcgacg aggtgcaggc cgagatcctg cggcgcaaac tgacccggct cgacgcgtac 
54 01 gtcgcgggtc ggcgggcggt cgcccagcgg tacgtcgacg ggctcgccga cctccaagac 
5461 tcgcacggcc tcgaactccc agtggtcacc gacggcaacg aacacgtctt ctacgtgtac 
5521 gtcgtccgcc acccgcgccg cgacgagatc atcaagcgtc tccgggacgg gtacgacatc 
5581 tccctgaaca tcagctaccc ctggccggtg cacaccatga ccggcttcgc ccacctcggt 
564 1 gtcgcgtcgg ggtcgctgcc ggtcaccgaa cggctggccg gcgagatctt ctcccttccc 
5701 atgtacccct ccctccctca cgacctgcag gacagggtga tcgaggcggt gcgggaggtc 
5761 atcaccgggc tgtgacgagc ccgcgtgtcg tcagcgaaga cccactctgg aagggccggt 
5821 catgccgaac agccactcga ccacgtcgag caccgacgtc gccccgtacg agcgggcgga 
58 81 catctaccac gacttctacc acggccgtgg caagggatac cgtgccgaag ccgacgcgct 
5941 cgtggaggtc gcccgcaagc acaccccaca ggcggcgacc ctgctggacg tggcctgcgg 
60 01 gaccggatcc cacctggtcg agctggcgga cagcttccgg gaggtggtgg gggtcgacct 
6061 gtcggccgcc atgctcgcca ccgccgcccg caacgacccc gggcgggaac tgcaccaggg 
6121 cgacatgcgc gacttctccc tcgaccgcag gttcgacgtc gtcacctgca tgttcagctc 
6181 caccggttac ctcgtcgacg aggccgaact ggaccgtgcc gtggcgaacc tggccggtca 
6241 cctcgcgcct ggcggcaccc tcgtcgtgga gccctggtgg ttcccggaga cgttccggcc 
6301 cggctgggtc ggggccgacc tggtcaccag cggtgaccgg aggatctccc ggatgtcgca 

63 61 caccgtcccg gcgggtctgc ccgaccgcac cgcctcccgg atgaccatcc actacacggt 
6421 ggggtcaccg gaggccggga tcgagcactt caccgaggtg cacgtgatga ccctgttcgc 

64 81 ccgcgccgcc tacgagcagg ccttccagcg ggcgggcctg agctgctcgt acgtcggcca 
6541 cgacctgttc tcgccgggcc ttttcgtcgg ggtcgccgcg gagccggggc ggtgagggtc 
6601 gaggagctgg gcatcgaggg ggtcttcacc ttcaccccgc agacgttcgc cgacgagcgg 
6661 ggggtgttcg gcacggcgta ccaggaggac gtgttcgtgg cggcgctcgg ccgcccgctg 
6721 ttcccggtgg cccaggtcag caccacccgg tcccggcggg gtgtggtccg gggggtgcac . 
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6781 ttcacgacga tgcccggctc catggcgaag tacgtctact gcgccagggg tagggcgatg 
6841 gacttcgccg tcgacatccg gcccggttcc ccgaccttcg gccgggccga gccggtcgag 
6901 ctctccgccg agtcgatggt cgggctgtac cttcccgtgg gcatgggcca cctgttcgtc 
6961 tccctggagg acgacaccac cctcgtctac ctgatgtccg ccggttacgt ccccgacaag 
7021 gaacgggcgg tgcaccccct ggatccggag ctggcgttgc cgatcccggc cgacctcgac 
7081 ctcgtcatgt ccgagcggga ccgggtcgca cccaccctcc gggaggcccg ggaccagggg 
7141 atcctgcccg actacgccgc ctgccgggcc gccgcgcacc gggtggtgcg gacgtgaccc 
7201 cggccgggcg tgcgggccgg tggtggtgct cggcgcgtcg ggtttcctgg gttcggcggt 
7261 cacccacgcc ctggccgacc tcccggtgcg ggtgcggctc gtcgcccggc gggaggtcgt 
7321 cgtgccctcc ggtgccgtcg ccgactacga gacgcaccgg gtggacctca ccgaacccgg 
7381 agcgctcgcg gaggtggtcg cggacgcccg ggcggtctcc ccgttcgccg cccagatcag 
7441 gggtacgtca gggtggcgga tcagcgagga cgacgtggtc gccgaacgga cgaacgtcgg 
7501 cctggtccgg gacctgatcg ccgtcctgtc ccgctcgccg cacgccccgg tggtggtctt 
7561 cccgggcagc aacacgcagg tcggcagggt caccgccggc cgggtcatcg acggcagcga 
7621 gcaggaccac cccgagggcg tctacgacag gcagaaacac accggggaac agctgctcaa 
7681 ggaggccact gcggccgggg cgatccgggc gaccagtctg cggctgcccc cggtgttcgg 
7741 ggtgcccgcc gccggcaccg ccgacgaccg gggggtggtc tccaccatga tccgtcgggc 
7801 cctgaccggc caaccgctga cgatgtggca cgacggcacc gtccggcgtg aactgctgta 
7861 cgtgaccgac gccgcccggg ccttcgtcac cgccctggac cacgccgacg cgctcgccgg 
7921 acgccacttc ctgttgggga cggggcgttc ctggccgctg ggcgaggtct tccaggcggt 
7981 ctcgcgcagc gtcgcccggc acaccggcga ggacccggtg ccggtggtct cggtgccgcc 
8041 tccggcgcac atggacccgt cggacctgcg cagcgtggag gtcgaccccg cccggttcac 
8101 ggctgtcacc gggtggcggg ccacggtcac gatggcggag gcggtcgacc ggacggtggc 
8161 ggcgttggcc ccccgccggg ccgccgcccc gtccgagcec tcctgaccgg ggtcacccgg 
8221 gtccgtccta cggcaccggc ccgtcgacgg ccggtgccgg gaagatcgct tcgagttccc 
8281 ggagttcctc ctcgcccagc gtcagctcgg cggcccgtaa cgccgagtcg agctgctcgg 
8341 gtgtgcgggg gccgatgaca gcgcccagga tcccggggcg ggacaggacc caggccagac 
84 01 cgacctcggc cgggtccgcg ccgaggcgtc ggcagtagtc ctcgtacgcc tcgacgaggg 
8461 ggcgtacggc ggggaggagc acctgggcgc gtccctgcgc cgacttgacg gcggttccgg 
8521 ctgccaactt ctccagtacg ccgctgagca gcccgccgtg caggggggac caggcgaaca 
8581 cgcccacccc gtacgcctgg gcggcgggca ggacgtccag ctcggggtgg cggacggcca 
8641 ggttgtacag gcactggtgg gagatcatgc cgagcaggtt gcggcgtgcc gcgctctcct 
8701 gggcggcggc gatgtgccag cccgccaggt tggaggagcc gacgtacccg accttcccac 
8761 tgccgaccag atgttcggcg gcctgccaca cctcgtccca cggtgcggcg cggtcgatgt 
8821 ggtgcgtctg gtagatgccg atgtggtcga ccccgaggcg gcggagggag ttctcgcagg 
8881 cggcgacgat gtgtcgggcg gagagcccgc cgtcgttgac ccgttcgctc atctcgctgc 
8941 ccaccttggt cgccaggacg gtctcctcgc gtcgacctcc gccctgggcg aaccaccgtc 
9001 cgacgagttc ctcggtgtgg cccttgtaga gccgccagcc gtagatgtcg gcggtgtcga 
9061 tgcagttgac gccccgctcg agggcgtggt ccatcagccg cagcgcgtcg tcgtcggtca 
9121 cccgtccact gaagttcacg gtgccgagcc agagtcggct ggtgtgcaac gccgatcgtc 
9181 cgacgcgtac ccgggcggac ccggccccgg tggttcccac gtcggtcacc tgtcggcgcg 
9241 gtgctggtgg gcgagcgcct ccagcacggg tacgacctcg gcgggggtcg gcgcggccag 
9301 cgcctcctgc cgcagcttct cggcgttctc ggcgtgggaa cggtcctcga ccactgtggc 
9361 gagagcctgc cagagggtgt cggcgtcgac ctcgtccgga cggaggaaga cacccgctcc 
9421 cagctcggcg gtgcgctgac cacgcaggac acagtcccac tcgtgggcga cggagatctg 
94 81 cggtacgccg tggtgcagcg cggtggccca gcttccggca ccgccgtggt ggatgacggc 
9541 ggcacagccc ggcagcagga tgttcatggg aacgaagtcc accaggcgga cgttgtccgg 
9601 caccgacgcc ggatcgagcc cggagcgggt caccacgatc tcgccgtcga accgcgcgag 
9661 ggtggccagt gtccggagga actcctgcgg gttcgaggtg atgcccagcg ccgagtatcc 
9721 cccggtgaag cagacccggc ggactccgtc cgaggtcctg agccactgcg gcacgacgga 
9781 ggacccgttg tagggcaaag tccgggtgtg caccgactcc agtccggtct ccaggcggaa 
9841 gctctcgggc agctggtcga cgctccactg tccgacagcg aggtcctcgc tgtagtcgag 
9901 gccgaaccgg ccggcgacct cggtgagcca gccgccgagc gggtccggcc ggtcgtcggc 
9961 gggacgctgc ccgcgcaggt cctgggagcg gctgcggaag tagccggtga ggtcgctgcc 
10021 ccacagcagc cgggcgtggg cggccccgca ggccttggcc gcgaccgccc cggcgaaggt 
10081 gaagggctcc cagagcacca ggtcgggacg ccagtccatg gcgaactcga cgagttcgtc 
10141 gacgaaggag tcgttgttga ccaccgggaa gacgaaccgg gaggtggcct cctcgatgcc 
10201 gtgcaggaac tcccacgagc gcagttccgg tccgcgtcgg gcgaagtcca ggtcggtggt 

102 61 gtagcggtgc acctgcgcgg cggcctcagg ggagatgtcg aagagtcggt ggtccgagcc 
10321 gagtggcacc gaggtcagtc ccgcgccgac gacgacgtcg gtgagctcgg gctgactggc 

103 81 cacccggacg tcgtggccgg cggtgtgcag cgcccaggcc agggggacga ggccctggaa 
10441 gtgggtacgg tgcgcgaacg aggtgagcag gacccgcact ggtcactcct tggtcgagat 
10501 gagggcggca acggtccggt cgatgccctc ggccagcggc acccgggggt gccagccggt 
10561 cagcgtccgg aactcggtgg agtcgaagtc gtcgctgcgg aagtcgttgg cctcggcgtt 
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10621 ctccggtgga gggacgctga cgacgggcac cgcagggttg ccggtctgac gtgccacgct 
10681 ggcggcgacg gtctcgaaga tctcgccgag gggtcgggcc tcgtccgcgc tcggcgtcca 
10741 gacgtcgccg accagcgcct cgtggttgtg cagtgcggcg gtgaacgcgg tggccacgtc 
10801 ctcgacgtgc aggaggttgc ggcgcacgct gccctcgtgc cacatcgtga tcggctcacc 
10861 ggcgagggct cgccggatca tggcggtgac gacaccccgg ccggtctgcc ccgacgggce 
10921 gctgtggccg tagatcgcgg gcaggcgcag gatcaccccg tcgacgaccc cgtcctcggt 
10981 ggcctgacgc aggatccgct cggcctcgat cttgtgctgg gcgtaccggc tgggggcggc 
11041 ggggttcgcg gcctgggtgg tgctggcgaa caggagcacc ggcgcgggtc cgggtcttgc. 
11101 ccgcagcgcg gcgacgaggt cgcgcatgat gcccgcgttg acgcgttcgg cctcgggcac 
11161 cgtggcggcg ctgcgccagg tcgacccgcc ggcggcgtag gcgaccagat gcacgacgac 
11221 gtcggtgtcg gcgacgacct gcgcgacccg gccgggttcg agcaggtcga ctcgaaggtg 
11281 ctcgatcccg gcgctgcctg gtggctggtc gcgagacccg gtgcgcgcga cggcccgcag 
11341 tcggagaggg tgtgtggtaa attcgcgaag aagggcgctt ccgacgaatc cagaaacgcc 
11401 gagaagtgtg acatgtcttg tcatctacta atgcattccg atagccaccg gcgcatggaa 
11461 tccatttgtt ccccccaggg tggtgtcggg tgacaaatcc ggcctcaggt cggcctcaag 
11521 cctctttcga gcgggtgctg aggcttcccg cgtaccctcg gtggcctgcg ttcgggcggg 
115 81 tgtcggggaa agggcggatc gaggagttcg gtagggcgtc gcggcgcgta ctccgggact 
11641 gatccgggtc gacgccccga cgcgtgacag ggcgtcgatc cgtgccgccc gtaccgccgg 
11701 ttttcggcga tggtcgcaga ttcctcccga cgtggtggac tcattggttc tcccgggtgt 
11761 ggccgcaccg tcggtggcct cgtcgggggt gtcggagacc gggtcgatcg ccgtccccgg 
11821 ccgtgccgac cagggtcggt ccgtcgccga ggtgggtcac cgtcgggtgg acccggtccg 
11881 ccggcggcca ccgcccgatc gtgcccacct tcgcctccgc gggtaaatgc ttcgtcgatc 
11941 cgatcgacac ttccggcgac gctatcaccg gagcattccc cggcaccacc ggtcgatgcc 
12001 tcgcgctttc caaacaggga aaacagcagc tcacagcggt tccaggcgcc gggcaatcct 
12061 agcgaagagt ctcgatgggg tcaaggtgaa ttctgtcaca gatgtttttg ttaaatgtac 
12121 tttcttcagc caccctcgac gttcatacaa ttggccggca tctctaccaa gggggagtga 
12181 gtggttgacg tgcccgatct actcggcacc cggactccgc acccagggcc gctcccattc 
12241 ccgtggcccc tgtgcggtca caacgaaccg gagctgcggg cccgcgcccg tcaattgcac 
12301 gcatatctcg aaggcatttc cgaggatgac gtggtggccg tcggcgccgc cctcgcgcgc 

123 61 gagacacgcg cgcaggacgg gccgcaccgc gccgtcgtcg tggcctcctc ggtcaccgag 
12421 ctgaccgccg cgctcgccgc cctcgcccag ggccgcccac acccctcggt ggtacgcggt 

124 81 gtcgcccgac ccacggcacc ggtggtgttc gtcctgcccg gtcagggcgc ccagtggccc 
12541 ggcatggcga cccgactgct cgccgagtcg cccgtcttcg ccgcggcgat gcgggcctgc 
12601 gagcgggcct tcgacgaggt caccgactgg tcgttgaccg aggtcctgga ctcacccgag 
12661 cacctgcgcc gcgtcgaggt ggtccagccc gcgctcttcg cggtgcagac ctcactggcc 
12721 gccctgtggc ggtcgttcgg ggtgcgaccc gacgccgtac tcggacacag catcggtgag 
12781 ctggccgccg ccgaggtctg cggcgccgtc gacgtcgagg ccgccgcgcg ggccgccgcc 
12841 ctgtggagcc gcgagatggt cccactggtg ggccggggtg acatggcggc ggtggcgctc 
12901 tccccggccg agctggcagc ccgggtcgag cggtgggacg acgacgtcgt gccggccggg 
12951 gtcaacggtc cccggtcggt gctgctcacc ggcgctcccg agcccatcgc acggcgggtc 
13021 gccgagctgg cggcacaggg cgtacgcgcc caggtcgtca acgtgtcgat ggcggcgcac 
13081 tcggcgcagg tcgacgccgt cgccgagggc atgcgctcgg cgctgacctg gttcgccccc 
13141 ggcgactccg acgtgcccta ctacgccggc ctcaccggcg ggcggctgga cacccgggaa 
13201 ctcggcgccg accactggcc gcgcagtttc cggctcccgg tgcgcttcga cgaggcgacc 
13261 cgtgcggtcc tggaactgca gcccggcacg ttcatcgagt cgagcccgca cccggtgctg 
13321 gcggcctccc tgcagcagac cctcgacgag gtcgggtccc cggccgcgat cgtgccgacc 
13381 ctgcaacgcg accagggcgg tctgcggcgg ttcctgctcg ccgtggcgca ggcgtacacc 
13441 ggtggcgtga cagtcgactg gaccgccgcc taccccgggg tgacccccgg ccacctgccg 
13S01 tcggccgtcg ccgtcgagac cgacgaggga ccctcgacgg agttcgactg ggccgcgccc 
13561 gaccacgtac tgcgcgcgcg gctgctggag atcgtcggcg ccgagacggc cgcgctcgcc 
13 621 gggcgggagg tcgacgcccg ggccaccttc cgggaactgg gcctcgactc ggtcctcgcg 
13681 gtgcagctgc ggacccgcct cgccacggcg accgggcggg atctgcacat cgccatgctc 
13741 tacgaccacc cgaccccgca cgccctcacc gaggcgctgc tgcgcggccc gcaggaggag 
13 801 ccggggcggg gtgaggagac ggcacacccg acggaggccg aacccgacga acccgtcgcc 
13 861 gtggtcgcca tggcgtgccg gctgcccggc ggcgtcacct caccggagga gttctgggag 
13 921 ctgctggccg aggggcggga cgccgtcggc gggctgccca ccgaccgggg atgggacctg 

13 981 gactcgctgt tccacccgga cccgacccgg tcgggcacgg cgcaccagcg cgctggtggc 
14041 ttcctcaccg gcgccacctc cttcgacgct gccttcttcg ggctgtcgcc acgggaggca 
14101 ctggccgtcg agccgcagca gcggatcacg ttggagctgt cgtgggaggt gctggaacgc 
14161 gccgggatcc ccccgacgtc gttgcggacc tcccggaccg gggtgttcgt cggtctgatc 
14221 ccccaggagt acggcccccg gctggccgag gggggtgagg gcgtcgaggg ctacctgatg 
14281 accgggacca ccaccagcgt cgcctccggt cgggtcgcct acaccctcgg cctggagggg 

14 341 ccggcgatca gcgtcgacac cgcctgctcg tcgtcgctcg tcgccgtgca cctggcgtgc 
14 401 cagtcgctgc ggcgcggcga gtcgacgatg gcgctcgccg gtggcgtgac ggtgatgccg. 
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14461 acaccgggca tgctcgtgga cttcagtcgg 
14521 aaggcgttct cggccgccgc cgacgggttc 
14 581 ctggaacggc tctcggacgc ccgccgccac 
14641 accgctgtca actccgacgg cgcgagcaac 
14 701 gtccgggcga tccgacaggc cctcgccgag 
i4761 gtggagaccc acggcaccgg cacccgcctc 
14821 gacgcgtacg gcggtgaccg tgagcacccg 

14 881 gggcacaccc aggccgccgc cggtgtcgcc 
14941 gccggtgtcc tgccccgcac cctgcacgcc 
15001 tcgggcgcga tcagcctgct ccaggagccc 
15061 cgggccgggg tgtcctcgtt cggcatcagc 
15121 gcgccgccga ccggtgacga cacccgaccc 
15181 ctctcggcga gcaccggcga ggcgttgcgc 
15241 cgcgagcacc ccgaccagga cctggacgac 
15301 gcgctggcgt accgtagtgg gttcgtgccc 
15361 gacgaactcg ccgccggtgg atccggggac 
15421 cgcgtcgtct tcgtcttccc cggccaggga 
154 81 ctcgacggcg acccggtctt cgcctcggtg 
15541 tacctggact tcgagatcgt cccgttcctg 
15601 cacacgctct ccaccgaccg cgtcgacgtg 
1S661 tccctggcgg cccggtggcg ggcgtacggg 

15 721 cagggggaga ttgccgcggc gtgtgtggcc 
15781 gcggtggccc tgcgcagccg ggtcatcgcc 
15841 atcgccgcct ccgtcgacga ggtggcggcc 
15901 gtcaacggtc cgcgcgcggt ggtggtctcc 
15961 gcctcctgca ccgtcgaggg ggtgcgggcc 
16021 tcctcgcacg tcgaggccgt ccgtgacgcg 
16081 ctgccgggct tcgtgccgtt ctactcgaca 
16141 ctcgacgccg ggtactggtt tcgcaacctg 
16201 cgctccctcg ccgaccaggg gtacacgacg 

162 61 accacggcga tcgaggagat cggtgaggac 
16321 ctgcgacgtg gggccggcgg tcccgtcgac 

163 81 gccggcgtcg cagtggactg ggagtcggcg 
16441 ctgcecacgt acccgttcca gcgtgagcgc 
16501 gtcgccgact ccgacgacgt ctcgtccctg 
16561 ccgggtgagc cgggacggct cgacggcacc 
16621 gacgaccggg tcgaggcggc gcggcaggcg 
166 81 ctggtggtgg agccccggac gggccgggtc 
16741 ccggtggcgg gcgtgctctg cctgttcgct 
16801 ctggcggtga cgtcgttgtc ggacacgctc 
16861 cgggagtgtc cgatctgggt ggtcaccgag 
16921 ctccgcgacc cggcccacgg cgcgctctgg 
16981 cccgccgtct ggggcggcct ggtcgacgtg 
17041 cacctcggga cgaccctgtc cggcgccggc 
17101 acgtacgccc gccggtggtg cagggcgggc 
17161 ggcacggtgc tcgtcaccgg cggcaccggc 
17221 gcccgccagg gcaccccgtg cctggtgctg 
172 81 gtcgaggagc tactcaccga actcgccgac 
17341 gacgtcaccg accgggagca gctccgtgcc 
17401 ctgtcggcgg tgttccacgt cgccgcgacg 

174 61 ggtgaccgca tcgaacgggc caaccgggcg 
17521 ctgacccggg acgccgacct cgacgcgttc 

175 81 ggcgcgccgg ggctcggcgg ctacgtcccg 
17 641 cagcgacgca gcgagggact cccggccacc 
17701 gggatggccg agggtccggt cgccgaccgg 

177 61 cccgaccagg ccgtcgaggg tctccgggtg 
17821 gtcgtcgaca tcaggtggga ccggttcctc 

178 81 ctcttcgaca ccctcgacga ggcccgtcgg 
17941 gtggcggcgc tggccgggct gcccgtcggg 
18001 cggacgcacg cggctgccgt cctcggccac 
18061 gccttcgccg aactcggcgt cgactcgctg 
18121 actgcgaccg gggtccggct ggccacgacg 
18181 ctggccggac acctggccgc cgaactgggc 
18241 gaggccccga cggtggcccc gaccgacgag 



atgaactccc tcgcccccga cggacggtcc 
ggcacggccg aaggcgcagg gatgctcctg 
ggccacccgg tgctcgccgt gatcaggggc 
ggactctccg ccccgaacgg ccgggcccag 
tccgggctga cgccccacac cgtcgacgtc 
ggtgatccga tcgaggcacg ggcgctctcc 
ctgcggatcg gctcggtcaa gtccaacatc 
ggtctgatca aactggtgtt ggcgatgcag 
gacgagccgt caccggagat cgactggtcc 
gctgcctggc ccgccggcga gcggccccgc 
ggcaccaacg cacacgcgat catcgaggag 
gaccggatgg gcccggtggt gccctgggtg 
gcccgggcgg cgcggctggc cgggcaccta 
gtcgcctact cgctggccac cggtcgggcc 
gccgacgcgt ccacggcgct gcggatcctc 
gcggtgaccg gcaccgcccg cgccccgcag 
tggcagtggg cggggatggc agtcgacctg 
ctgcgggagt gcgccgacgc gttggaaccg 
cgggccgagg cgcagcgccg gacccccgac 
gtccagccgg tgctgttcgc ggtgatggtg 
gtggaaccgg cggccgtcat cggacactcc 
ggggcgctct cgctggacga cgcggcccgg 
accatgcccg gcaacggcgc gatggcctcg 
cggatcgacg ggcgggtcga gatcgccgcc 
ggcgaccgtg acgacctgga ccgcctggtc 
aagcggctgc cggtggacta cgcgtcgcac 
ctccacgccg aactcggcga gttccggccg 
gtcaccggcc gctgggtcga gcccgccgaa 
cgccacaggg tccggttcgc cgacgcggtc 
ttcctggagg tcagcgccca cccggtgctc 
cgtggcggtg acctcgtcgc tgtccactcg 
ttcggctccg cgctggcccg cgccttcgtg 
taccagggtg ccggggcgcg tcgggtgccg 
ttctggttgg aaccgaatcc ggcccgcagg 
cggtaccgca tcgaatggca cccgaccgat 
tggctgctgg cgacgtaccc cggtcgggcc 
ctggagtccg ccggggcgcg ggtcgaggac 
gacctggtgc ggcggctcga cgccgtgggt 
gtcgcggagc cggcggccga acactccccg 
gacctgaccc aggcggtggc cgggtcgggc 
aacgccgtcg ccgtcgggcc cttcgaacgg 
gccctcggtc gggtcgtcgc cctggagaac 
ccgtcgggtt cggtcgccga gctgtcgcgt 
gaggaccagg tcgccctccg acccgacggg 
9cgggcggca cg9gccggtg gcagccccgg 
9999tcggtc ggcacgtcgc ccggtggctg 
gccagccgcc ggggaccgga cgccgacggg 
ctgggcaccc gggccaccgt caccgcctgc 
ctcctcgcga ccgtcgacga cgagcacccg 
ctcgacgacg gcaccgtcga gaccctcacc 
aaggtgctcg gtgcccgcaa cctgcacgag 
gtgctcttct cctcctccac cgccgcgttc 
ggcaacgcct acctcgacgg tctcgcccag 
tcggtggcgt ggggtacctg ggcgggcagc 
ttccgccggc acggggtcat ggagatgcac 
gcactggtgc agggtgaggt agccccgatc 
ctcgcgtaca ccgcgcagcg ccccacccgg 
gccgcgcccg gtcccgacgc cgggccgggg 
gaacgcgaga aggcggtcct cgacctggta 
gcctcggccg agcaggtgcc cgtcgacagg 
tcggccctgg aactgcgcaa ccggctgacc 
acggtcttcg accacccgga cgtacggacc 
ggcggatcgg ggcgggagcg gcccgggggc 
ccgatcgcca tcgtcgggat ggcctgccgg . 
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18301 ctgccggggg gagtggactc accggagcag ctgtgggagt tgatcgtctc cgggcgggac 
183 61 accgcctcgg cggcacccgg ggaccggagc tgggatccgg cggagttgat ggtctccgac 
18421 acgacgggca cccgtaccgc cttcggcaac ttcatgcccg gggcgggcga gttcgacgcg 
18481 gcgttcttcg ggatctcgcc gcgtgaggcg ttggcgatgg atccgcagca gcggcacgcc 
18541 ctggagacca cctgggaggc gctggagaac gccggtatcc ggcccgagtc gttgcgcggt 
18601 acggacaccg gtgtcttcgt gggcatgtcc catcaggggt acgccaccgg ccgcccgaag 
18661 cccgaggacg aggtcgacgg ctacctgttg acaggcaaca ccgcgagcgt cgcctccggt 
18721 cggatcgcgc acgtgttggg gttggagggg ccggcgatca ctgtggacac ggcgtgttcg 
18781 tcgtcgcttg tggcgttgca cgtggcggcg ggttcgttgc gttctgggga ctgtggtctg 
18841 gcggtggcgg gtggggtgtc ggtgatggcc ggtccggagg tgttcaggga gttctcccgg 
18901 cagggcgcgt tggctccgga cggcaggtgc aagcccttct cggacgaggc cgacggcttc 
18961 ggtctggggg aggggtcggc cttcgtcgtg ttgcagcggt tgtcggtggc ggtgcgggag 
19021 gggcgtcggg tgttgggtgt ggtggtgggt tcggcggtga atcaggatgg ggcgagtaat 
19081 gggttggcgg cgccgtcggg ggtggcgcag cagcgggtga ttcggcgggc gtggggtcgt 
19141 gcgggtgtgt cgggtgggga tgtgggtgtg gtggaggcgc atgggacggg gacgcggctg 
19201 ggggatccgg tggagttggg ggcgttgttg gggacgtatg gggtgggtcg gggtggggtg 
19261 ggtccggtgg tggtgggttc ggtgaaggcg aatgtgggtc atgtgcaggc ggcggcgggt 
19321 gtggtgggtg tgatcaaggt ggtgttgggg ttgggtcggg ggttggtggg tccgatggtg 
193 81 tgtcggggtg ggttgtcggg gttggtggat tggtcgtcgg gtgggttggt ggtggcggat 
19441 ggggtgcggg ggtggccggt gggtgtggat ggggtgcgtc ggggtggggt gtcggcgttt 
19501 ggggcgtcgg ggacgaatgc tcatgtggtg gtggcggagg cgccggggtc ggtggtgggg 
19561 gcggaacggc cggtggaggg gtcgtcgcgg gggttggtgg gggtggttgg tggtgtggtg 
19621 ccggtggtgc tgtcggcaaa gaccgaaacc gccctgcacg cccaggcacg tcgactcgcc 
19681 gaccacctgg agacgcaccc . cgacgtcccg atgaccgacg tggtgtggac gctgacgcag 
19741 gcccgccaac gcttcgacag gcgcgcggtc ctcctcgccg ccgaccggac ccaggccgtg 
19801 gaacggctgc gcggcctcgc cgggggcgaa ccggggaccg gtgtggtgtc gggggtggcg 
19861 tcgggtggtg gtgtggtgtt tgtttttcct ggtcagggtg gtcagtgggt ggggatggcg 
19921 cgggggttgt tgtcggttcc ggtgtttgtg gagtcggtgg tggagtgtga tgcggtggtg 
19981 tcgtcggtgg tggggttttc ggtgttgggg gtgttggagg gtcggtcggg tgcgccgtcg 
20041 ttggatcggg tggatgtggt gcagccggtg ttgttcgtgg tgatggtgtc gttggcgcgg 
20101 ttgtggcggt ggtgtggggt tgtgcctgcg gcggtggtgg gtcattcgca gggggagatc 
20161 gcggcggcgg tggtggcggg ggtgttgtcg gtgggtgatg gtgcgcgggt ggtggcgttg 
20221 cgggcgcggg cgttgcgggc gttggccggc cacggcggca tggcctcggt acgccgaggc 
20281 cgcgacgacg tacagaagct cctcgacagc ggcccctgga cggggaagct ggagatcgcc 
20341 gcggtcaacg gccccgacgc ggtggtggtc tccggcgacc cccgagccgt gaccgagctg 
20401 gtcgagcact gtgacgggat cggggtccgg gcccggacga tccccgtcga ctacgcctcc 
20461 cactccgcac aggtcgagtc gctccgggag gagctgctct ccgtcctggc cgggatcgag 
20521 ggccgcccgg cgacggtgcc gttctactcc accctcaccg gtgggttcgt cgacggcacc 
20561 gaactggacg ccgactactg gtaccgcaac ctgcgccacc cggtgcggtt ccacgccgcc 
20641 gtcgaggcgc tggcagcgcg tgacctcacc acgttcgtcg aggtcagccc gcaccccgtg 
20701 ctgtcgatgg cggtcgggga gacgcttgcc gacgtggagt ccgccgtcac tgtgggcacc 
20761 ctggaacgcg acaccgacga cgtcgagcgc ttcctcacct ccctcgccga ggcgcacgtc 
20821 cacggcgtac ccgtggactg ggcggcggtc ctcggctccg gaaccctggt cgacctgccc 
20881 acctatccct tccagggacg gcggttctgg ctgcaccccg accgtggtcc gcgtgacgat 
20941 gtcgccgact ggttccaccg ggtcgactgg acggcgacgg ccaccgacgg gtcggcccga 
21001 ctcgacggtc gctggctggt ggtcgtaccc gaggggtaca cggacgacgg ctgggtcgtg 
21061 gaggtgcggg ccgccctcgc cgccggtggt gccgagccgg tggtgacgac ggtcgaggag 
21121 gtcaccgacc gggtcggtga cagcgacgcg gtggtgtcga tgctcgggct ggccgacgac 
21181 ggtgcggccg agaccctggc gctgctgcga cgactcgacg cacaggcgtc caccacccca 
21241 ctgtgggtgg tcaccgtggg ggccgtcgcc cccgccggtc cggtgcagcg ccccgaacag 

213 01 gcgacggtgt gggggttggc ccttgtcgcc tccctggaac gcggacaccg gtggaccggc 
21361 ctgctggatc tgccgcagac accggacccg cagctacgac cccggctggt cgaggcgctc 
21421 gccggtgccg aggaccaggt agcggtccgc gccgacgccg tacacgcccg tcggatcgtc 

214 81 cccaccccgg tcaccggagc cgggccgtac accgccccgg gcgggacgat cctcgtcacc 
21541 gggggcaccg ccggtctggg tgccgtcacc gcccgatggc tcgccgagcg cggtgccgaa 
21601 cacctcgccc tggtcagccg gcgcgggccg ggcaccgccg gcgtcgacga ggtggtccgg 
21661 gacctgaccg ggctcggcgt acgggtgtcg gtgcactcct gcgacgtcgg cgaccgcgag 
21721 tcggtcggcg ccctggtgca ggagttgaca gcagccggtg acgtggtccg gggggtggtc 
21781 cacgctgccg gtctgcccca gcaggtgcca ctgaccgaca tggacccggc cgacctcgcc 
21841 gacgtggtgg ccgtgaaggt cgacggcgcg gtgcacctgg ccgacctgtg cccggaggcc 
21901 gaactgttcc tgctgttctc ctccggggcc ggggtgtggg gcagtgcccg tcagggtgcg 
21961 tacgccgccg gaaacgcctt cctggacgcc ttcgcccgac accggcggga ccggggtctg 
22021 cccgccacct cggtggcgtg ggggctctgg gcggccgggg ggatgacagg ggaccaggag 
22081 gcggtgtcgt tcctgcgtga gcggggcgta cggccgatgt cggtgccgag ggcactgfgaa 
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22141 gcgctggaac gggtcctcac cgccggggag 
22201 gcggccttcg ccgagtcgta cacctccgcc 
22261 acacctgcgg cggcggtcgg cgagcgcgac 
22321 ctggcggccc tgccccgggc cgagcggtcg 
22381 gccgcagccg tgctcggcag cgacgcgaag 
22441 ctcgggttcg actcgctggc cgcggtccgg 
22501 ctgcgtctgc cggccaccct ggtcttcgag 
22561 ctccacgacc gactcggcga ggccggcgag 
22621 ctggccgcgc cggagcaggc cctgcccgac 
22681 gagcgcctgg aacggatgct cgccgggctc 
22 741 ccgaccgccg gtgacgacct gggggaggcc 
22801 cgggaactcg acgccaggtg aacccgaact 
22861 ggacctgtga ctgacaacga caaggtggcg 
22921 cgggccgccc gcaagcgcct gcgcgagctg 
22981 gcctgccgcc taccgggcgg ggtgcacctc 
23041 gggcacgaga cggtgtccac cttccccacc 
23101 cacccggacc ccgaccaccc cggcaccagc 
23161 gtggcgggct tcgacgccga gttcttcggg 
23221 ccgcaacagc ggctgctgtt ggagaccagt 
23281 ccgcactccc tgcgcggcac cccgaccggc 
23341 ggcgagaacg gcaccgaagc cggtgacgcc 
23401 gctgtcgcct ccgggcggat ctcctacgcc 
23461 gacaccgcgt gctcgtcgtc gtcggtggcg 
23521 ggcgagtcga gtcccgctgt cgtcggcggg 
23581 gtcgacttca gccgccagcg ggcgttggcc 
2364 1 gccgccgacg ggt tcggctt ctccgagggg 
23701 gaggccgaaa gcaacggcca cgaggtgttg 
23761 gacggggcca gcaacggtct cgccgcgccg 
23821 caggcgctac gaaactgcgg cctgaccccg 
23881 accggcacca cgctcggcga cccgatcgag 
23941 gaccgggatc cggaccaccc gctgtggctg 
24001 caggcggcgg cgggrgtcac cgggctgctc 
24061 ctgcccgcca ccctgcacgt cgacgagccc 
24121 gtacgcctgg cgacccgggg ccggccgtgg 
24181 gtgtcggcgt tcggcatcag cgggaccaac 
24241 cggaccaccg agcgcaccgt cggcggcgac 
24 3 01 cggtcggcgg cggcgctacg ggcccaggcg 
24361 gacgtcgggc tggcggaggt cgggcggagc 
24421 cgggcggcgg tggtggcgtc gacccgggcc 
24481 gcggtcgaac cgcgcggcga ggacaccgtc 
24541 gtcgtcttcc tctccccggg acaggggtcc 
24601 gactcggcac cggcgttcgc cgacacgatc 
24661 caggactggt cggtctccga cgtgctccgg 
24721 gtcgacgtgg tgcagccggt gctgttcgcg 
24781 tcgtacgggg tcacccccgc tgcggtggtg 
2484 1 cacgtggcgg gtgcgctctc cctcgccgac 
24901 ttgctgcggt cgctgtccgg gggcggcggc 
24961 gtacgccgcc gactgcggtc gtgggaggac 
25021 cggtcggtgg tggtggccgg ggaaccggag 
25081 gccgagggcg tacgggtccg cgagatcgac 
25141 gacagggtcc gtgacgaact cctgacggtc 
25201 atcaccttct acccgacggt cgacgtccgt 
25261 tactggtacc gcaacctgcg ggagacggtc 
25321 gactcgggat acgacgcgtt cgtcgaggtc 
253 81 gccgaggcgg tcgaggaggc aggtgtcgag 
25441 ggcgacggcg gaccgggggc gttcctgcgg 
25501 gacgtcgact ggacgcccgc cctcccggga 
25561 ttccaacgga agccgtactg gctgcggtcg 
25621 gcctaccggg tgtcctggac gccgatcacc 
25681 tggctggtgg tgcaccccgg gggcagcacc 
25741 accgccggcg gtggccgggt cgtcgcccac 
25801 ctggccgagg cgctcgcccg gcgggacggc 
25861 accgacgaac ggcacgtcga ggccggtgcg 
25921 ggtgacgccg gaatcgacgc accactgtgg 



accgcggtgg tcgtcgccga cgtcgactgg 
cggccccggc cgctgctcca ccggctcgtc 
gagccgcgtg agcagaccct ccgggaccgg 
gcggagctgg tacgcctggt ccggcgggac 
gccgtacccg ccaccacgcc gttcaaggac 
ttccgtaacc ggctggccgc ccacaccggt 
cacccgaacg ccgcagccgt cgccgacctc 
ccgacccccg tccggtcggt gggcgccgga 
gcctccgaca cggagcgggt cgagctggtc 
cgccccgagg ccggagccgg ggccgacgcc 
ggcgtcgacg aactcctcga cgcgctcgaa 
gaccgcagcc gcagccgaag cagagaccga 
gagtacctcc gtcgtgcgac gctcgacctg 
caatccgacc cgatcgcggt cgtcggcatg 
ccgcagcacc tgtgggacct cctgcgccag 
gggcgcggct gggacctggc cgggctcttc 
tacgtcgacc ggggtgggtt cctcgacgac 
atctccccgc gcgaggccac ggccatggac 
tgggagctgg tggagagcgc cggcatcgat 
gtcttcctcg gcgtggcgcg gctcggctac 
gagggctatt cggtgaccgg ggtggcaccc 
ctcgggctgg- agggtccgtc gatcagcgtg 
ctgcacctgg cggtcgagtc gctgcggctg 
gcggcggtca tggcgacacc aggggtgttc 
gctgacggca ggtcgaaggc cttcggggcc 
gtctccctcg tcctgctcga acggctctcc 
gctgtcatcc gtggctccgc cctcaaccag 
aacgggaccg cccagcgcaa ggtgatccgg 
gccgacgtgg acgccgtgga ggcgcacggc 
gccaacgccc tgctggacac ctacggccgt 
gggtcggtga agtcgaacat cggccacacg 
aagatggtgc tggcactgcg ccacgaggaa 
accccgcacg tggactggtc ctcgggagcg 
cggcggggtg accggccgag gcgggccggg 
gcccacgtga tcgtcgagga ggcacccgag 
gtcggcccgg tcccgctcgt ggtgtccgcc 
gcccaggtcg ccgagctggt ggagggctcc 
ctggccgtga cccgggcgcg acacgagcac 
gaggcggtgc gggggctgcg cgaggtcgcg 
accggggtcg ccgagacgtc cgggcgcacc 
cagtgggtcg ggatgggcgc ggagctgctg 
cgcgcctgcg acgaggcgat ggcaccgttg 
caggagccgg gggcaccggg actggaccgg 
gtgatggtgt cgttggcgcg gttgtggcag 
gggcactcgc agggggagat cgccgccgcc 
gcggcgaggc tggtggtggg ccgcagccgg 
atgagcgccg tcgcgctcgg tgaggccgag 
cggatctccg tggccgccgt caacggaccc 
gcgctgcggg agtggggacg ggagcgggag 
gtcgactacg cctcgcactc gccgcagatc 
acgggggaga tcgagccccg gtcggcggag 
gctgtcgacg gcaccgacct ggacgcgggg 
cggttcgccg acgcgatgac ccggttggcc 
agcccgcatc cggtggtggt gtcggcggtc 
gacgccgtcg tcgtcggcac cctgtcccgg 
tcggcggcca ccgcccactg cgccggtgtg 
gctgcgacga tcccgttgcc gacgtacccg 
tctgctcccg cccccgcctc ccacgatctc 
ccgcccgggg acggcgtact cgacggcgac 
ggatgggtcg acgggttggc ggcggcgatc 
ccggtggact ccgtgacctc ccggaccggc 
acgttccggg gggtgctgtc gtgggtggcg 
gtcgccctgc tgaccctggc gcaggcgttg 
tgcctgaccc aggaggcggt ccgtaccccc- 
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25981 gtcgacggtg acctggcccg accggcgcag 
26041 cggctggagc tggcccgccg cttcggtggg 
26101 gccgggacgc gtctggtcgc ggcggtcctc 
26161 cgtggcgacc gtctctacgg ccgtcgcctg 
26221 gggttcaccc cgcacggcac cgtcctggtc 
262 81 ctggcccggt ggctcgccga acggggtgcc 
26341 ggcgaggagt tgctgaccgc gatccgggcc 
26401 gaggcggagg cactgcgtac ggcgatcggc 
26461 gagacgttga cgaacttcgc cggcgtcgcc 
26521 gticgcggcga agaccgcgct gccgacggtc 
26581 gaacgggagg tctactgctc gtcggtggcc 
26641 tacgccgccg gcagcgccta cctcgacgcc 
26701 gccagcgcct cggtggcctg gaccccgtgg 
26761 ctgcgcgagc gcggcctgcg cagcctcgac 
26821 ctgctccgcg ccggtgcggt gtcggtggcc 
26881 gagggtttcg cggccatccg gccgaccccg 
26941 gaccccgacg gcgcgcccgt cgaccggccg 
27001 atcgcggcgc tgtccccgca ggaacagcgg 
27061 gtcgcggagg tgctgggaca cgagaccggc 
27121 gaactcggcc tcgactcgct gggctcgatg 
27181 ggcctgcgga tgccggcctc gctggtcttc 
27241 tacctgcgtc gactggtcgt cggggactcc 
27301 accgacgagg ccgaacccgt cgccgtggtc 
27361 gccacccccg aggacctctg gcgggtggtg 
27421 cccaccgacc ggggctggga cctccggcgg 
27481 accagctacg tcgacagggg gggattcctc 
27541 ttcgggatca ccccccgcga ggcgctggcg 
27601 atcgcgtggg aggcggtgga acgggcgggc 
27661 accggcgtct tcgtcggcat gaacggccag 
27721 gaccggctca acggctacca ggggttgggc 
27781 gcctacacct tcgggtggga ggggccggcg 
27841 ctggtcgcca tccacctcgc catgcagtcg 
2 7901 gccggcgggg tgacggtcat ggccgacccg 
27 961 gggc-tcgccg ccgacgggcg gtgcaaggcg 
28021 gccgagggcg tcgcggcgct cgtcctcgaa 
2 80 81 caggtgctgg cggtgctgcg cggcagcgcc 
28141 gccgccccga acgggccgtc gcaggaacgg 
28201 ctgcgtcccg ccgacgtcga catggtggag 
28261 ccgatcgagg ccggggcgct catcgcggcg 
28321 ctgggctcgg tgaagacgaa catcggccac 
283 81 atcaaggcgg tcctggcgat gcggcacggc 
28441 ttgtccccgc acatcgactg ggcggacggg 
28501 tggccccccg gtgagcgccc ccgccgcgcc 
28561 aacgcccacg tcatcgtcga ggaggcaccc 
28621 gccccgggcg ggcccctgcc cttcgtcctg 
2 8681 caggcgcgga ccctcgccga acacctgcgc 
28741 gcccgtaccc tggccaccgg tcgcgcccgt 
28801 gaccgggagg gtgtctgcgc cgccctcgac 
28861 gtcgtcgccc cggcggtctt cgccgcccgt 
28921 tcgcagtggg tcggcatggc ccgtgacctg 
28981 atgggccggt gcgccgaggc gctgtcgccg 
2 9041 cgtggggtcg gcgaccccga cccgtacgac 
29101 gcggtgatgg tgtcgctggc gcggttgtgg 
29161 gtgggtcact cgcaggggga gatcgccgcc 
29221 gacgccgcca gggtggtggc gttgcgcagc 
29281 ggcatggtgt cggtcggcac ctcccgcgcc 
29341 gggcgggtcg cggtggcggc ggtgaacgga 
29401 gccgaactgg acgagttcct cgcggtggcc 
29461 gcggtgcgct acgcgtcgca ctccccggag 
29521 gaactcggca ccgtcaccgc cgtcggcggc 
29581 gacctcctcg acaccacagc catggacgcc 
29641 gtgctgttcg agcacgccgt ccgcagcctc 
29701 gtcagcccgc accctgtgct gctgatggcg 
2 9761 ccggtcaccg gcgtgccgac gctgcgccgc 



gccgccctgc acggtttcgc ccaggtcgcc 
gtgctcgacc tgcccgccac cgtcgacgcc 
gccggcggcg gcgaggacgt cgtcgccgtc 
gtcagggcga ccctgccgcc gcccggcggg 
accggcgcgg ccggtccggt gggcggtcgg 
acccgactcg tcctgcccgg cgcacacccg 
gccggtgcca ccgccgtgg.t gtgcgaaccg 
ggggagttgc cgaccgcgct cgtacacgcc 
gacgccgacc ccgaggactt cgccgccacc 
ctggcggagg tgctcggcga ccaccgcctc 
ggggtctggg gtggggtcgg catggccgcg 
ctggtcgagc accgtcgcgc ccgggggcac 
gccctgcccg gcgcggtcga cgacggtcgg 
gtggccgacg ccctcgggac gtgggaacgt 
gtcgccgacg tcgactggtc ggtcttcaca 
ctcttcgacg aactcctcga ccggcgcggg 
ggggagccgg cgggcgagtg gggtcgacga 
gagacgttgc tgaccctcgt cggcgagacg 
accgagatca acacccgtcg ggccttcagc 
gccctgcgtc agcgcctggc ggcccgtacc 
gaccacccga cggtcaccgc gctcgcgcgg 
gacccgaccc cggtacgggt gttcggcccc 
ggcatcggct gccggttccc cggcggcatc 
tccgagggca cctccatcac caccggattc 
ctctaccacc ccgacccgga ccaccccggc 
gacggggccc cggacttcga ccccgggttc 
atggacccgc agcagcggct caccctggag 
atcgacccgg agaccctcct cggcagcgac 
tcctacctgc aactgctgac cggggagggt 
aactcggcga gcgtgctctc cggccgtgtc 
ctgacggtgg acaccgcctg ctcgtcctcg 
ctgcgtcggg gtgagtgctc gctggcgttg 
tacaccttcg tggacttcag cgcacagcgg 
tcctccgcgc aggccgacgg gttcgccctc 
ccgttgtcca aggcgcggcg aaacggccac 
gtcaaccagg acggggccag caacggcctc 
gtgatcaggc aggccctgac cgcctccggg 
gcgcacggga cgggcaccga actcggcgac 
tacggccggg accgggaccg gccgctctgg 
acccaggccg ccgccggtgc cgccggggtg 
gtactcccga ggtcgctgca cgccgacgag 
aaggtcgagg tgctccgcga ggcacgacag 
ggggtgtcct ccttcggcgt cagcgggacc 
gccgaaccgg accccgaacc ggttcccgcc 
cacggacgca gcgtccagac ggtccggtcc 
accaccggcc accgggacct cgccgacacc 
ttcgacgtcc gggccgcagt gctcggcacc 
gcgctggcgc aggatcgccc ctcgcccgac 
acccccgtcc tggtcttccc cgggcagggg 
ctcgactcct ccgaggtgtt cgccgagtcg 
tacaccgact gggacctgct cgacgtggtc 
cgggtggacg tgctccagcc ggtgctgttc 
cagtcgtacg gggtgactcc gggtgcggtg 
gcgcacgtgg ctggtgcgtt gtcgttggcc 
cgggtgctgc gggagctcga cgaccagggc 
gagttggact cggtcctgcg ccggtgggac 
cccggcacgc tcgtggtggc cggacccacc 
gaggcccgcg agatgaggcc gcgtcggatc 
gtggcccggg tcgaacagcg gctcgccgcc 
acggtcccgc tctactccac cgccaccggg 
gggtactggt accgcaacct gcgccaaccg 
ctggagcggg gattcgagac gttcatcgag 
gtcgaggaga ccgccgagga cgccgagcgc 
gaccacgacg ggccgtcgga gttcctccgc 
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29821 aacctcctgg gggcgcacgt gcacggggtc gacgtcgacc tgcgtccggc ggtcgcccac 
29881 ggccgcctgg tcgacctgcc cacctacccc ttcgacaggc agcggctctg gcccaagccg 
29941 caccgcaggg ccgacacccc gtcgctgggg gtccgtgact cgacccaccc gctgctgcac 
30001 gccgcagtcg acgtacccgg tcacggcgga gcggtgttca ccgggcggct ctcccccgac 
30061 gagcagcagt ggctgaccca gcacgtggtg. ggtgggcgga acctggtgcc cggcagtgtc 
30121 ctggtcgacc tcgcgctcac cgccggggcc gacgtcggcg tgccggtgct ggaggaactc 
3 0181 gtcctgcagc agccgctggt gttgaccgcc gccggtgcgc cgctgcgccc gtcggtcggc 
30241 gccgccgacg aggacgggcg gcggccggtc gagatccacg ccgccgagga cgtctccgac 
3 03 01 ccggccgagg cccggtggtc ggcgtacgcg accgggaccc tcgccgtcgg cgtggccggc 
3 03 61 ggcggccggg acggcacaca gtggcccccg cccggcgcca ccgccccgac gttgaccgac 
3 0421 cactacgaca ccctcgccga actgggctac gagtacgggc cggcgttcca ggcgctgcgc 
3 0481 gccgcgtggc agcacggcga cgtggtctac gcggaggtgt ccctcgacgc cgtcgaggag 
30541 gggtacgcgt tcgacccggt gctgcccgac gccgtcgccc agaccttcgg cctgaccagt 
3 0601 cgcgcccccg ggaagctccc cttcgcctgg cggggcgtca ccctgcacgc caccggggcc 
3 0661 actgcggtac gggtggtggc gacccccgcc ggaccggacg cggtggccct gcgggtcacc 
3 0721 gacccgaccg gtcagctcgt cgccacggtg gacgccctgg tcgtcaggga cgccggggcg 
3 0781 gatcgggacc agccgcgcgg ccgcgacggc gacctgcacc gcctggagtg ggtacggctg 
3 0841 gccaccccgg acccgacccc ggcggcggtg gtgcacgtgg cggccgacgg gctcgacgac 
3 0901 ctgctgcgcg ccggtggtcc ggcaccacag gccgtcgtcg tccgctaccg tcccgacggc 
3 0961 gacgacccga cggccgaggc ccgtcacggg gtgctctggg cggccacgct cgtgcgccgt 
31021 tggctcgacg acgaccggtg gcccgccacc accctggtgg tggccacgtc cgcaggggtc 
31081 gaggtctccc ccggggacga cgtgccgcgc cccggggccg ccgccgtgtg gggggtgctg 
31141 cgctgcgccc aggcggagtc cccggaccgc ttcgtgctcg tcgacggcga cccggagacg 
31201 cccccggcgg tgccggacaa tccgcagctc gcggtccgtg acggtgcggt gttcgtgcca 
31261 cggctgacgc cgctcgccgg tcccgtgccg gccgtcgccg accgggcgta ccggccggtg 
31321 cccggcaacg gcggctccat cgaggcagtg gccttcgccc ccgtccccga cgccgaccgg 
313 81 cccctggcgc cggaggaggt acgcgtcgcc gtccgcgcca ccggcgtgaa cttccgtgac 
31441 gtcctgctcg cgctcggcat gtacccggaa ccggccgaga tgggcaccga ggcgtccggt 
31501 gtggtcaccg aggtcgggtc gggtgtccgg cggttcaccc ccggccaggc ggtgacgggc 
31561 ctgttccagg gggccttcgg gccggcggcg gtcgccgacc accggctcct caccccggcc 
31621 cccgacgggt ggcgggcggt ggacgccgca gccgtaccca tcgcgttcac caccgcccac 
31681 tacgcgctgc acgacctggc cgggttgcag gccgggcagt ccgtgctggt ccacgccgcc 
31741 gccggcgggg tggggatggc tgccgtcgcg ttggcccgtc gggccggggc ggaggtgttc 
31801 gccacggcca gcccggccaa acacccgacg ctgcgggcgc tcggcctcga cgacgaccac 
31861 atcgcctcgt cccgggagag cgggttcggt gagcggttcg ccgcgcgtac cggggggcgg 
31921 ggcgtcgacg tggtcctgaa ctcgctcacc ggcgacctgc tcgacgagtc cgcgcggctg 
31981 ctcgccgacg gcggggtctt cgtcgagatg ggcaagaccg acctgcggcc ggcggagcag 
32041 ttccggggcc ggtacgtccc gttcgacctg gccgaggccg gtcccgatcg gctcggcgag 
32101 atcctggagg aggtcgtcgg tccgctggcc gccggtgccc tcgaccggtt gccggtgtcg 
32161 gtgtgggagt tgtcggcggc cccggccgcg ctcacccaca tgagccgggg ccgacacgtg 
32221 ggcaagctcg tcctcaccca gcccgccccc gtgcaccccg acggaacggt gctggtcacc 
32281 ggcgggaccg gcaccctggg gcggctggtc gcccgccacc tggtgaccgg gcacggcgta 
32341 ccccacctcc tggtggccag ccggcgcggt ccggcggccc cgggcgcggc cgagctgcgc 
32401 gccgacgtcg aaggcctcgg cgcgaccatc gagatcgtcg cctgcgacac cgccgaccgg 
32461 gaggcgctcg cggcgctgct cgactcgatc cccgcggacc gtccgctgac cggggtggtg 
32521 cacaccgccg gggtcctggc cgacgggctg gtcacctcca tcgacgggac cgccaccgat 
32581 caggtcctgc gggccaaggt cgacgcggcg tggcacctgc acgacctgac ccgggacgcg 
32641 gacctgagct tcttcgtgct gttctcgtcg gcggcgtcgg tgctggccgg tcccgggcag 
32701 ggcgtgtacg cggcggccaa cggggtcctc aacgccctgg ccgggcaacg gcgggccctc 
3 2761 ggactgcccg cgaaggcgct cgggtggggc ctgtgggcgc aggccagcga gatgaccagc 
32821 ggcctcggtg accggatcgc ccgtaccggg gtcgccgcgc tgccgaccga gcgggcgctg 
32881 gccctgttcg acgcggctct gcgcagcggc ggggaggtgc tgttcccgct gtctgtcgac 
32941 aggtcggcgc tgcgccgggc cgagtacgtc cccgaggtgc tgcgcggcgc ggtccggtcc 
33001 acgccacggg ccgccaacag ggccgagacc ccgggccggg gcctgctcga ccgtctcgtc 
33061 ggtgcacccg agaccgatca ggtggccgcg ctggccgagc tggtccgccc gcacgcggcg 
33121 gcggtcgccg gctacgactc ggccgaccag ctgcccgaac gcaaggcgct caaggacctc 
33181 gggttcgact cgctggcggc ggtggagctg cgcaaccggc tcggcgtcac caccggcgta 
33241 cggctgccca gcacgctggt gttcgaccac ccgacaccgc tggcggtggc cgaacacctg 
33 301 cggccggagt tgttcgccga ccccgcgccg gacgtcgggg tcggtgcgcg cctcgacgac 

333 61 ctggaacggg cgctcgacgc cctgcccgac gcgcagggac acgccgacgt cggggcccgc 
3 3421 ctggaggcgc tgctgcgccg gtggcagagc cgacgacccc cggagaccga gccagtgacg 

334 81 atcagtgacg acgccagtga cgacgagctg ttctcgatgc tcgacaggcg tctcggcggg 
33541 ggaggggacg tctaggtgac aggtcgattc cgccccgcgg cagtggaccg taccgccctg 
33601 acaggtccac cgggttcgcg tcgcctccca cacccgacgg ccggggtatc cacggaaggg - 



23/30 



ISDOCID: <WQ 



0127284A2 I > 



WO 01/27284 



PCTYUS00/27433 



3 3661 atccgatgag cgagagcagc 
33721 ccgccgccga actcgactcg 
33781 aaccgatcgc cgtcgtcggc 
33841 cgttctggga gttcatccgc 
3 3 901 gctggccgcc ggcaccgcga 
33961 acgccgcctt cttcggcatc 
34021 tgatgctgga gatctcctgg 
34081 gcggcagcgc cggtggcgtc 
34141 acgaggcacc cgaggaggtg 
34201 ccggacgggt ggcgtacacc 
34261 gctcctccgg gctcaccgcg 
34321 ccctggtcct cgccggtggg 
34381 gcagccaggg cgggttggcc 
34441 gcttcgggct cgccgagggg 
34501 ccgagggccg gccggtgctg 
34561 gcaacgggct caccgcgccg 
34621 agcgggcgcg gctgcgtccc 
34681 ggctgggcga cccgatcgag 
34741 ccggccgccc gctccgggtc 
34801 cgggggtggc cggggtgatg 
34861 cgttgcacct cgacgagccc 
34921 tgtccgagac ccggccctgg 
34981 tcggcatcag cggcaccaac 
35041 ccgaccccga cccgaccccc 
35101 ccaccgccga gccgggtgcg 
35161 ccctgcgcgc ccaggcggcc 
35221 tgcgcgacac cgccttcacc 
35281 tcgtcggcgg gggcgaggag 
35341 tcgacggagc cgtcagcggg 
35401 ggcagggcgc acaccggcag 
35461 cggagtccat cgacgcctgc 
35521 aggtgctcga cggcgagcag 
35581 cggtgatggt gtcgttggcg 
35641 tgggtcactc gcagggggag 
35701 acgccgccag ggtggcggcg 
35761 ggatggcgtc gttcgggctic 
35821 gtgcgctgac cgtcgcctcg 
35881 gcccgttgga cgagctgatc 
35941 ccgtcgacta cgccccacac 
36001 cactggccgg ggtccgtccg 
36061 aggtcatcga aacggcgacg 
36121 tgcgcttcca ggacgccacc 
36181 tcagcccgca ccccgcgttg 
36241 ccgacgcgga tccgcgtgtc 
36301 tccacaccgc gcccgccgag 
36361 tgggtgaggg acgcccggtc 
36421 tcccggtccc cctgggcccg 
36481 ggcaccccgt cgaccccggg 
36541 cggcagtacc cccggcctgg 
36601 ccgtcgtgtt gtgcaccgcg 
36661 acggcaccgc cctgrccacc 
36721 acgaccccag cctggacacc 
36781 tccccctgtg gctggtgacc 
36841 cggcccaggc catggtcggt 
36901 ggggtggcct ggtggacctg 
36961 tactggccga cccgcgcggc 
37021 cccgtctcgt cccggcaccg 
37081 tcctggtcac cggcggcacc 
37141 cgggcgccga gcacctggtg 
37201 acctgcgtga cgaactggtc 
37261 ccgaccgcga ccggccggcg 
37321 cggcggtgtt ccacgccgcc 
37381 gcgagttcac cgagatcacc 
37441 gtcccgagct ggacgccctc 



ggcatgaccg aggaccgcct 
gtgacaggtc ggctcgacga 
atggcctgcc ggttccccgg 
gacggtggtg acgcgatcgc 
ccccgcctcg gtggtctcct 
tcaccccgcg aggcgctcgc 
gaggcgttgg agcgtgcggg 
ttcaccggtg tcggtgcggt 
ctcggctacg tcggcatcgg 
ctggggttgg agggtccagc 
gtgcacctgg cgatggagtc 
gtcaccgtga tgagcagccc 
gaggacggcc gctgcaaacc 
gccggggtcc tggtgctcca 
gccgtactgc gtggctcggc 
agcggccccg cccagcggcg 
gtcgacgtgg actacgtgga 
gcgcacgccc tgctcgacac 
ggatcggtga agtccaacat 
aagaccgtgc tggcgctgcg 
tcgccgcacg tcgactggga 
ccggtggggg agcgcccgcg 
gcgcacgtca tcgtcgagga 
ggcccggcaa ccggagcgac 
gaggcggtcg cactggtgtt 
cggctcgccg accgtctcac 
ctggtcaccc gccgtgccac 
gtcctcgccg gcctccgggc 
cgggcgcgcg ccggccgccg 
ggcatggccc gggacctgct 
gagcgggcgc tcgccccgca 
tcgttggacc ccgtcgacgt 
cggttgtggc agtcgtacgg 
atcgccgccg cgcacgtggc 
ttgcgcagcc gggtgctgcg 
caccccgacc aggccgccga 
gtcaacggtc cccgttcggt 
gccgagtgcg aggccgaggg 
tccccgcagg tggagtcgct 
gtgtcggccg ggatccccct 
atggacgccg actactggtt 
aggcagctcg ccgaggcggg 
acagtcggtg tcgaggccac 
acaggcaccc tgcgccgcga 
gcgtacaccc ggggggtgga 
gacctgccgg tctacccgtt 
gtccccgaca ccggcgacga 
cggtcctccc tggccggacg 
acggacgtgg tccgcgacgg 
cagtcgcgcg cccggatcgg 
gtggtctctc tgctcgcgct 
ctcgcgttgg tccaggcgct 
agggacgccg ccgccgtgac 
gggctcggcc gggtggtggg 
cgcgaggccg acgccgactc 
gaggagcagt tcgcgatccg 
gcccgcgcgg cgggtacccg 
ggcggcatcg gcgcgcacct 
ctgctcaaca ggcggggagc 
gcgctcggca cgggagtcac 
gccgtcctcg acgccgcacg 
cggatctccc ggtccacagc 
gacgcgaagg tgcggggtac 
gtgctgttct cctcgaacgc 



ccggcgctat ctcaagcgca 
ggtcgagtac cgggcccgcg 
gggtgtggac tcgccggagg 
cgaggcgccc acggaccgtg 
cgcggagccg ggcgcgttcg 
gacggacccc cagcagcgcc 
tttcgacccg tcgagcctgc 
ggactacgga cccaggccgg 
caccgcctcc agcgtcgcct 
cgtcaccgtc gacaccgcct 
gctgcgccgc gacgagtgca 
gggtgcgttc accgagttcc 
gttctcccgc gccgccgacg 
acggctgtcc gtcgcccggg 
gatcaaccag gacggtgcca 
ggtgatcagg caggcgttgg 
ggcccacggc accggcaccc 
gtacggtgcc gaccgggaac 
cggtcacacc caggcggcgg 
gcatcgggag atcccggcga 
ccggggtgcg gtgtcggtgg 
ccgggcgggg gtgtcctcgt 
ggcgccgagc ccgcaggcgg 
ccccggaacg gatgccgccc 
ctccgcgcgc gacgagcggg 
cgacgacccg gccccctcgt 
ctgggagcat cgggcggtcg 
cgtcgccggg ggacgtcccg 
ggtggtgctg gtcttccccg 
gcggcagtcg ccgaccttcg 
cgtggactgg tcgctgcgcg 
ggtgcagccg gtgctgttcg 
ggtgactccg ggtgcggtgg 
tggtgcgttg tcgttggccg 
ccgtctcggt ggtcacggcg 
gcggatcgcg cgcttcgcgg 
ggtgctggcc ggggagaacg 
cgtgaccgcc cgtcggatcc 
gcgtgaggag ctgctcgccg 
gtactcgacc ctgaccggtc 
cgccaacctc cgggagccgg 
gttcgacgcc ttcgtcgagg 
cctcgaggca gtgctgcccc 
acgcggcggt ctcgcgcagt 
ggtcgactgg cgtaccgcag 
ccaacgacag aacttctggc 
gtggcgttac cagctcgcct 
ggtcctggtg gtgaccggag 
cctggaacag cgcggggcga 
cgccgcactc gacgccgtcg 
cgccgagggc ggtgctgtcg 
cggcgcagcc gggatcgacg 
cgtcggagac gacgtcgatc 
cgtggagtcc cccgcccggt 
ggcccggtcg ctggccgcca 
gcccgacggc gtcaccgtcg 
gtggacgccg cgcgggaccg 
ggcccgctgg ctcgccggtg 
ggaggcggcc ggtgccgccg 
catcacggcc tgcgacgtcg 
ggcgcaggga cgggtggtca 
ggtacaggag ctgaccgaga 
ggcgaacctg gccgaactct 
ggcggtgtgg ggcagcccgg - 
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37501 ggctggcctc ctacgcggcg ggcaacgcct 
37561 gcagtgggct gccggtcacc tcgatcgcct 
3 7621 gtaccgaggg cggcgactac ctgcgcagcc 
37681 cgatcgagga gctgcggacc accctggacg 
37741 tggaccggga gcggttcgtc gaactgttca 
3 7 801 aactcggtgg ggtccgcgcc ggggccgagg 
37861 ggctggcgtc gatgccggag gccgaacgtc 
3 7921 aggtggcagc ggtgctgggc cacggcacgc 
37981 gtgacctggg attcgactcc atgaccgccg 
38041 ccggggtccg ggtggccacg accatcgtct 
3 8101 cgcactacct ggaacgactc gtcggtgagc 
3 8161 tcccgcaggc acccggggag gccgacgagc 
38221 tcgccggtgg agtgcgtacc cccgaccagt 
38281 cggtcaccga gatgccgtcg gaccggtcct 
38341 ccgagcggca cggcaccagc tactcccggc 
38401 tcgacgcggc gttcttcggg atctcgccgc 
3 8461 ggcaggtcct ggagacgacg tgggagctgt 
38521 tgcgcggtac ggacaccggt gtcttcctcg 
38581 cgcaggtgcc gaaggagagt gagggttacc 
38641 ccggtcggat cgcgtacgtg ttggggttgg 
38701 gttcgtcgtc gcttgtggcg ttgcacgtgg 
38761 ggctcgcggt ggcgggtggg gtgtcggtga 
38821 ccaggcaggg cgcgctggcc cccgacggtc 
38881 ggttcggatt cgccgagggc gtcgctgtgg 
38941 gggaggggcg tcgggtgttg ggtgtggtgg 
39001 gtaatgggtt ggcggcgccg tcgggggtgg 
39061 gtcgtgcggg tgtgtcgggt ggggatgtgg 
3 9121 ggttggggga tccggtggag ttgggggcgt 

3 9181 gggtgggtcc ggtggtggtg ggttcggtga 
39241 cgggtgtggt gggtgtgatc aaggtggtgt 
39301 tggtgtgtcg gggtgggttg tcggggttgg 
39361 cggatggggt gcgggggtgg ccggtgggtg 
39421 cgtttggggt gtcggggacg aatgctcatg 
39481 tgggggcgga acggccggtg gaggggtcgt 
39541 tggtgccggc ggtgctgtcg gcaaagaccg 
39601 tgcacgacgc cgtcgacgac accgtcgccc 
39661 gacgcgccca cctgccctac cgggccgccc 
39721 acaggctgcg ggcgttcacc actggttcgg 
39781 cgggcggcgg tgtggtgttt gtttttcctg 
39841 gggggttgtt gtcggttccg gtgtttgtgg 
39901 cgtcggtggt ggggttttcg gtgttggggg 
39961 tggatcgggt ggatgtggtg cagccggtgt 
40021 tgtggcggtg gcgcggggtt gtgcctgcgg 
40081 cggcggcggt ggtggcgggg gtgttgtcgg 
40141 gggcgcgggc gttgcgggcg ttggccggcc 
40201 ccgaacgcgc ccgggagctg atcgcaccct 
40261 actccccgac ctcggtggtg gtctcgggtg 
40321 actgcgccga gaccggtgag cgggccaaga 
40381 cccacgtcga acagatccgc gacacgatcc 
40441 gacccgacgt cgccctctac tccacgctgc 
40501 acgcccggta ctggtacgac aacctgcgct 
40561 ccgccgtcgc cgacggctac cgggtcttcg 

4 0621 ccgcggtgca ggagatcgac gacgagacgg 
40681 gcgagcggca cctggtcgcc gaactcgccc 
40741 ggcgggcgat cctccccgcc acccacccgg 
40801 cccggtactg gctcgccccg acggcggccg 
40861 actggcggcc cctggccacc accccggcgg 
4 0921 acgccccgga gaccctcggc cacagcgtcg 
40981 ccgctcccga ccgggagtcc ctcgcggtcg 
41041 gtgtgctctc cttcgccgcc gacaccgcca 
41101 aggccgacgt cgaggcccca ctctggctgg 
41161 acgacccgat cgactgcgac caggcaatgg 
41221 agaccccgca ccggtggggc ggcctggtgg 
41281 gggtggtctt cgccgccctc ctggccgccg 



tcctcgacgc cttcgcccgt cgtggtcggc 
SSBgtctgtg 99ccgggcag aacatggccg 
agggcctgcg cgccatggac ccgcagcggg 
ccggggaccc gtgggtgtcg gtggtggacc 
ccgccgcccg ccgccggccc ctcttcgacg 
agaccggtca ggaatcggat ctcgcccggc 
acgagcatgt cgcccggctg gtccgagccg 
cgacggtgat cgagcgtgac gtcgccttcc 
tcgacctgcg gaaccggctc gcggcggtga 
tcgaccaccc gacagtggac cgcctcaccg 
cggaggcgac gaccccggct gcggcggtcg 
cgatcgcgat cgtcgggatg gcctgccgcc 
tgtgggactt catcgtcgcc gacggcgacg 
gggacctcga cgcgctgttc gacccggacc 
acggcgcgtt cctggacggg gcggccgact 
gtgaggcgtt ggcgatggat ccgcagcagc 
tcgagaacgc cggcatcgac ccgcactccc 
gcgctgcgta ccaggggtac ggccagaacg 
tgctcaccgg tggttcctcg gcggtcgcct 
aggggccggc gatcactgtg gacacggcgt 
cggccgggtc gctgcgatcg ggtgactgtg 
tggccggtcc ggaggtgttc accgagttct 
ggtgcaagcc cttctccgac caggccgacg 
tgctcctgca gcggttgtcg gtggcggtgc 
tgggttcggc ggtgaatcag gatggggcga 
cgcagcagcg ggtgattcgg cgggcgtggg 
gtgtggtgga ggcgcatggg acggggacgc 
tgttggggac gtatggggtg ggtcggggtg 
aggcgaatgt gggtcatgtg caggcggcgg 
tggggttggg tcgggggttg gtgggtccga 
tggattggtc gtcgggtggg ttggtggtgg 
tggatggggt gcgtcggggt ggggtgtcgg 
tggtggtggc ggaggcgccg gggtcggtgg 
cgcgggggtt ggtgggggtg gctggtggtg 
aaaccgccct gaccgagctc gcccgacgac 
tcccggcggt ggccgccacc ctcgccaccg 
tgctggcccg cgaccacgac gaactgcgcg 
cggctcccgg tgtggtgtcg ggggtggcgt 
gtcagggtgg tcagtgggtg gggatggcgc 
agtcggtggt ggagtgtgat gcggtggtgt 
tgttggaggg tcggtcgggt gcgccgtcgt 
tgttcgtggt gatggtgtcg ttggcgcggt 
cggtggtggg tcattcgcag ggggagatcg 
tgggtgatgg tgcgcgggtg gtggcgttgc 
acggcggcat ggtctccctc gcggtctccg 
ggtccgaccg gatctcggtg gcggcggtca 
acccacaggc cctcgccgcc ctcgtcgccc 
cgctgcctgt ggactacgcc tcccactccg 
tcaccgacct ggccgacgtc acggcgcgcc 
acggcgcccg gggcgccggc acggacatgg 
caccggtgcg cttcgacgag gccgtcgagg 
tcgagatgag cccacacccg gtcctcaccg 
tggccatcgg ctcgctgcac cgggacaccg 
gggcccacgt gcacggcgta ccagtggact 
ttcccctgcc gaactacccg ttcgaggcga 
accaggtcgc cgaccaccgc taccgcgtcg 
agctgtccgg cagctacctc gtcttcggcg 
agaaggccgg cgggctcctc gtcccggtgg 
ccctggacga ggcggccgga cgactcgccg 
cccacctggc ccggcaccga ctcctcggcg 
tcaccagcgg cggcgtcgca ctcgacgacc 
tgtgggggat cggacgggtg atgggtctgg 
acgtgaccgt cgaacccacc gccgaggacg 
acgaccacga ggaccaggtg gcgctgcgcg 
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41341 acggcatccg ccacggccga cggctcgtcc 
41401 ggacaccggc gggcacggcg ctcgtcacgg 
41461 cgcggtacct ggcccggtcc ggggtgaccg 
41521 acgcacccgg tgccgccgaa ctggccgccg 
41581 tcgaggcgtg cgacgtcacc gacgggccac 
41641 aacaggaccg gccggtccgg atcgtcgtcc 
41701 tcgaccggat cgacgaactg gagtcggtca 
41761 tcgacgagct ctgcccggac gccgacacct 
41821 ggggtagcgc gaacctgggc gcgtacgcgg 
418 81 accgccgccg ccaggcgggc cgggccgcga 
41941 acggcatggc caccggcgac ctcgacgggc 
42001 cggaccgggc gctgcgcgcc tgcaccaggc 
42061 tagccgacgt cgactgggac cgcttcgccg 
42121 tgatcgacga actcgtcacc tccgcgccgg 
42181 tcccggcgat gaccgccgac cagctactcc 
42241 tcggtcacca ggacccggac gcggtcgggt 
42301 actcgctcac cgccgtcggc ctgcgcaacc 
42361 ccgccgccct ggtgttccag caccccacgg 
42421 agctcgacgt cggcaccgcc ccggtcgagg 
424 81 ggcgggccgg gcagaccggc gacgtccggt 
42541 agttccggga gcggctcacc gacgcggcga 
42601 tggccgacgg atccggcccg gtcactgtga 
42661 ggccgcacga gttcgcccga ctcgcctcgg 
42721 tcgcgcaacc cgggtacgag gcgggtgaac 
427 81 gggtgcaggc ggacgcggtc ctcgcggcac 
42841 actcggcggg ggccctgatg gcgtacgccc 

42 901 cgccacgtgg cgtcgtgctc ctcgacgtgt 
42961 cctggctcgg cgagctgacc gccgccctgt 
43021 cccggctcac ggccctgggg gcgtacgaca 
43081 ccggtctgcc cacgctggtg gtggccgcca 
43141 gttggcagtc cacgtggccg ttcgggcacg 

43 2 01 cgatggtgca ggagcacgcc gacgcgatcg 
43261 agagggcatg aacacgaccg atcgcgccgt 
43321 actgtactgg ggttacggca gcaacggaga 
4 33 81 cgacgacccg caccgctggt accgggggct 
43441 cgagacgtgg gtggtgaccg accacgccac 
43 501 cacccgggcc accggccgga cgccggagtg 
43561 ctgggcgcag ccgttccgtg acgtgcacgc 
4 3621 gcaggaggtg gaggaccggc tgacgggtct 
43681 ggtccgcgac ctcgcctggc cgatggcgtc 
4 3741 gctgcgcgcc gcgtgggacg cccgggtcgg 
43801 ggcggtgacc gaggcggcga tcgccgcggt 
43861 caccgccgtc gagatgacag ccaccgcgtt 
43 921 ggcgggggcg gcccagcgtc tcgccgacga 
43981 ggtgctgcgc ctgcatccga cggcgcacct 
44041 ggtgggcgag cacacggtcg cggcgggcga 
<4101 ccgtgacgcg ggggtcttcg ccgacccgga 
44161 ccgggccctg tccgcccagc gcggtcaccc 
44221 gaccaccgcc gcactgcgca gcgtcgccaa 
44281 ggtcgtcagg cgacgtcgtt caccggtcct 
44341 ctgaggtgcc tgcgatgcgc gtcgtcttct 
44401 gtctcgttcc cctcgcctgg gccttccgcg 
44461 caccggctct caccgacgac atcacggcgg 
44521 acgtcgacct tgtcgacttc atgacccacg 
44581 gcctggactt cagcgagcgg gacccggcca 
44641 agaccgtcct caccccgacc ttctacgccc 
44701 tgatctcctt ctgtcggtcg tggcgacccg 
.44761 cgtcgatcgc ggcgacggtg accggcgtgg 
44821 tcacggtacg ggcccggcag aagttcctcg 
44881 gggaggaccc cctcgccgag tggctcacct 
44941 cgcaggacgt cgaggagctg gtggtcgggc 
45001 tgcgcctcga caccgggctg aggacggtgg 
45061 cggtggtgcc ggactggctg cacgacgagc 
45121 gcatctccag ccgggagaac agcatcgggc 



gcgccccgct gaccacccga aacgccaggt 
gcggtacggg tgccctcggc ggccacgtcg 
atctcgtcct gctcagcagg agcggccccg 
aactggccga cctcggggcc gagccgagag 
gcctgcgcgc cctggtgcag gagctacggg 
acaccgcagg ggtgcccgac tcccgtcccc 
gcgccgcgaa ggtgaccggg gcgcggctgc 
tcgtcccgtt ctcctcgggg gcgggagtgt 
cagccaacgc ctacctggac gccctggccc 
cctcggtcgc ctggggggcg tgggccggcg 
tgacccggcg cggtctgcgg gcgatggcac 
gttggaccac ccacgacacc tgtgtgtcgg 
tgggtttcac cgccgcccgg cccagacccc 
tggccgcccc caccgctgcg gcggccccgg 
agttcacgcg ctcgcacgtg gccgcgatcc 
tggaccagcc cttcaccgag ctgggcttcg 
agctccagca ggccaccggg cggacgctgc 
tacgcagact cgccgaccac ctcgcgcagc 
cgacgggcag cgtcctgcgg gacggctacc 
cgtacccgga cctgctggcg aacctgtcgg 
gcctgggcgg acagctggaa ctcgtcgacc 
tctgttgcgc gggcactgcg gcgctctccg 
cgctgcgcgg caccgtgccg gtgcgcgccc 
cggtgccggc gccgatggag gcagtgctcg 
agggcgacac gccgttcgtg ctggtcggac 
tggcgaccga gctggccgac cggggccacc 
acccacccgg tcaccaggag gcggtgcacg 
tcgaccacga gaccgtacgg atggacgaca 
ggctgaccgg caggtggcgt. ccgagggaca 
gcgagccgat gggggagtgg ccggacgacg 
acagggtcac ggtgcccggt gaccacttct 
c 9 c 99cacat cgacgcctgg ttgagcgggg 
gctgggccga cgactccaga tgatccgggg 
cccgtacccg atgctgttgt gcgggcacga 
999cggatcc ggggtccggc gcagccgtac 
cgccgtgcgg gtgctcgacg acccgacctt 
gatgcgggcc gcgggcgccc cggcctcgac 
cgcgtcctgg gacgccgaac tgcccgaccc 
cctgcctgcc ccggggaccc gcctggacct 
gcggggggtc ggcgcggacg accccgacgt 
cctcgacgcc cagctcaccc cgcagcccct 
gcccggggac ccgcaccggc gggcgctgtt 
cgtcgacgcg gtgctggcgg tgaccgccac 
ccccgacgtc gccgcccgtc tcgtcgcgga 
ggaacggcgt accgccggca ccgagacggt 
cgaggtcgtc gtggtggtcg ccgccgccaa 
ccgcctcgac ccggaccggg ccgacgccga 
cggccggttg gaggagctgg tggtggtcct 
ggcgctgccc ggtctcaccg ccggtggccc 
gcgagccacc gcccactgcc cggtcgaact 
cctccatggc cagcaagagc cacctgttcg 
cggcgggcca cgaggtacgg gtcgtcgcct 
ccggactgac ggccgtaccg gtcggcaccg 
ccgggtacga catcatcgac "tacgtccgca 
cctccacctg ggaccacctg ctcggcatgc 
tgatgagccc ggactcgctg gtcgagggca 
actggtcgtc tggaccgcag accttcgccg 
cccacgcccg actcctgtgg ggacccgaca 
ggctgctgcc cggacagccc gccgcccacc 
ggtctgtgga gaggttcggc ggccgggtgc 
agtggacgat cgaccccgcc ccggtcggga 
gcatgcgcta cgtcgactac aacggcccgt 
cgacccgccg acgggtctgc ctcaccctgg 
aggtctccgt cgacgacctg ttgggtgcgc 
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45181 tcggtgacgt cgacgccgag atcatcgcga cagtggacga gcagcagctc gaaggcgtcg 
45241 cccacgtccc ggccaacatc cgtacggtcg ggttcgtccc gatgcacgca ctgctgccga 
45301 cctgcgcggc gacggtgcac cacggcggtc ccggcagctg gcacaccgcc gccatccacg 
4 5361 gcgtgccgca ggtgatcctg cccgacggct gggacaccgg ggtccgcgcc cagcggaccg 
45421 aggaccaggg ggcgggcatc gccctgccgg tgcccgagct gacctccgac cagctccgcg 
4 5481 aggcggtgcg gcgggtcctg gacgatcccg ccttcaccgc cggtgcggcg cggatgcggg 
45541 ccgacatgct cgccgagccg tcccccgccg aggtcgtcga cgtctgtgcg gggctggtcg 
45 601 gggaacggac cgccgtcgga tgagcaccga cgccacccac gtccggctcg gccggtgcgc 
45 661 cctgctgacc agccggctct ggctgggtac ggcagccctc gccggccagg acgacgccga 
45721 cgcagtacgc ctgctcgacc acgcccgttc ccggggcgtc aactgcctcg acaccgccga 
4 57 81 cgacgactct gcgtcgacca gtgcccaggt cgccgaggag tcggtcggcc ggtggttggc 
45841 cggggacacc ggtcggcggg aggagaccgt cctgtcggtg acggtgggtg tcccaccggg 
45901 cgggcaggtc ggcgggggcg gcctctccgc ccggcagatc atcgcctcct gtgagggctc 
45 961 cctgcggcgt ctcggtgtcg accacgtcga cgtccttcac ctgccccggg tggaccgggt 
4 6021 ggagccgtgg gacgaggtct ggcaggcggt ggacgccctc gtggccgccg gaaaggtctg 
46081 ttacgtcggg tcgtcgggct tccccggatg gcacatcgtc gccgcccagg agcacgccgt 
4 6141 ccgccgtcac cgcctcggcc tggtgtccca ccagtgtcgg tacgacctga cgtcgcgcca 
4 62 01 tcccgaactg gaggtcctgc ccgccgcgca ggcgtacggg ctcggggtct tcgccaggcc 
4 62 61 gacccgcctc ggcggtctgc tcggcggcga cggtccgggc gccgcagccg cacgggcgtc 
46321 gggacagccg acggcactgc gctcggcggc ggaggcgtac gaggtgttct gcagagacct 
46381 cggcgagcac cccgccgagg tcgcactggc gtgggtgctg tcccggcccg gtgtggcggg 
4 6441 ggcggtcgtc ggtgcgcgga cgcccggacg gctcgactcc. gcgctccgcg cctgcggcgt 
46501 cgccctcggc gcgacggaac tcaccgccct ggacgggatc ttccccgggg tcgccgcagc 
4 6561 aggggcggcc ccggaggcgt ggctacggtg agagcccgcc cctgacctgc gggaacccgt 
4 6621 gtcggtgcgg cgggacggcc gccgcggtcc ccgccccggt cagccggtgg gggtgagccg 
46681 cagcaggtcc ggcgccaccg actcggccac ctccccgacg tggtcggcga ggtagaagtg 
4 6741 cccgcccggg aaggtccggg tacggccggg gactaccgag tacggcagcc agcgttgggc 
46801 gtcctccacc gtcgtcaacg ggtcggtgtc accgcagagg gtggtgatgc cggcccgcag 
46861 cggcggcccg gcctgccagg cgtaggagcg cagcacccgg tggtcggccc gcagcaccgg 
46921 cagcgacatg tccaacagcc cctggtcggc caatgcggcc tcgctgaccc cgagcctgcg 
46981 catctgctcg acgagtccgt cctcgtcggg caggtcggtg cgccgctcgt ggacccgggg 
47041 ggcggtctgc ccggagacga acaaccgcag cggtcgcacc cccggacgag cctccaggcg 
47101 acgggcggtc tcgtaggcga ccagggcgcc catgctgtga ccgaacaggg cgaacggaac 
47161 ctcgccgacg aggtcgcgca gcacggccgc gacctcgtcg gcgatctccc cggcggtgcc 
47221 gagagcccgc tcgtcacgtc ggtcctgccg gcccgggtac tgcaccgccc acacgtcgac 
47281 ctccggggcc agtgcccggg cgaggtcgag gtacgagtcg gcggcggctc ccgcgtgcgg 
47341 gaagcagtac agccgggccc ggtgtccgtc ggcggacccg aaccgccgca accaggtgtt 
47401 catcggtgtc tcatccgttc ggtcgcaccg gcaggtggtc gatgccgcgc agcaggagcg 
47461 accgccgcca gacaacctcg tcggagggga agcccagcga cagcttcggg aagcggtcga 
47521 acagggcccc cagggcgacc tctccctcca gcttggccag cgggcggccc atgcagtagt 
47581 ggatgccgtg cccgaaggtg aggtgtcccc ggctgtccct ggtgacgtcg aaccggtcgg 
47641 ggtcggggaa ctgtcccggg tcgcggttgg ccgccccgtt ggcgatcagg acggtgctgt 
47701 acgccgggat cgtcaccccg ccgatctcca cctcggcggt ggcgaaccgg gtggtggtct 
47761 ccggtggggc ctggtagcgc aggatctcct ccaccgctcc gggcagcagt gccgggtcct 
47821 tccggaccag cgcgagctgg tcggggtggg tcagcagcag gtaggtgccg atcccgatga 
47881 ggctcaccga cgcctcgaat cccgccagca gcagcaccag cgcgatggag gtgagttcgt 
47941 cgcggctgag ccggtcggcg tcgtcgtcct ggacccggat c 

(SEQ ID NO: 1) 
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SEQUENCE LISTING 

<110> Kosan Biosciences, Inc. 

<120> Recombinant Megalomicin Biosynthetic 
Genes and Uses Thereof 

<130> 300622004740 

<140> To be assigned 
<141> Herewith 

<I50> US 60/158,305 
<151> 1999-10-08 

<150> US 60/190,024 
<151> 2000-03-17 

<160> 34 

<170> FastSEQ for Windows Version 4.0 

<210> 1 
<211> 47981 
<212> DNA 

<213> Micromonospora megalomicea 

<220> 

<221> CDS 

<222> (1) ... (144) 

<223> megBVI (megT) , TDP-4 -keto-6-deoxyglucose-2 , 3-dehydratase; 
SEQ ID NO: 2= translated amino acid sequence 

<221> CDS 

<222> (928 ) . . . (2061) 

<223> megDVI , TDP-4-keto-6-deoxyglucose 3, 4-isomerase, 
TDP-4 - keto-6-deoxyhexose 3, 4-isomerase; 
SEQ ID NO: 3= translated amino acid sequence 

CDS 

(2072) . . . (3382) 

megDI, rhodosaminyl transferase (eryCIII homolog) , 
TDP-megosamine glycosyl transferase; 
SEQ ID NO; 4= translated amino acid sequence 

<221> CDS 

<222> (3462) . . . (4634) 

<223> megG(megY), mycarosyl acyltransf erase, mycarose O-acyltransf erase; 
SEQ ID NO: 5- translated amino acid sequence 

<221> CDS 

<222> (4651) . . . (5775) 

<223> megDII, deoxysugar transaminase (eryCI, DnrJ homolog), 
TDP-3-keto-6-deoxyhexose 3-aminotransaminase ; 
SEQ ID NO: 6~ translated amino acid sequence 

<221> CDS 

<222> (5822) . . . (6595) 

<223> megDII I, daunosaminyl-N, N-dimethyltransf erase (eryCVI homolog); 
SEQ ID NO: 7*= translated amino acid sequence 



<221> 
<222> 
<223> 
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<221> CDS 

<222> (6592) . . . (7197) 

<223> megDIV, TDP-4-keto-6-deoxyglucose 3, 5-epimerase (eryBVII, dnmU 
homolog), TDP-4 -keto-6-deoxyhexose 3 , 5-epimerase ; 
SEQ ID NO: 8= translated amino acid sequence 

<221> CDS 

<222> (7220) . . . (8206) 

<223> megDV, TDP-hexose 4-ketoreductase (eryBIV, dnmV homolog) , 
TDP-4 -keto-6-deoxyhexose 4-ketoreductase; 
SEQ ID NO NO: 9= translated amino acid sequence 

<221> CDS 

<222> (8228) . . . (9220) 

<223> megBII- 1 (megDVI I ) , TDP-4-keto-L-6-deoxy-hexose 2 , 3-reductase ; 
SEQ ID NO: 10= translated amino acid sequence 

<221> CDS 

<222> (9226) . . - (10479) 

<223> megBV, mycarosyl transferase, mycarose glycosyltransf erase ; 
SEQ ID NO: 11= translated amino acid sequence 

<221> CDS 

<222> (10483) . . . (11424) 

<223> megBIV, TDP-hexose 4 - keot reductase , 

TDP-4 -keto-6-deoxyhexose 4-ketoreductase; 
SEQ ID NO: 12= translated amino acid sequence 

<221> CDS 

<222> (12181) . . . (22821) 

<223> megAI; SEQ ID NO: 13= translated amino acid sequence 

<221> misc_feature 
<222> (12505) - - . (13470) 
<223> megAI, AT-L 

<221> misc_feature 
<222> (13576) . . . (13791) 
<223> megAI, ACP-L 

<221> misc_feature 
<222> (13849) . . . (15126) 
<223> megAI, KS1 

<221> misc_feature 
<222> (15427) . . . (16476) 
<223> megAI, ATI 

<221> misc__f eature 
<222> (17155) . . . (17694) 
<223> megAI, KR1 

<221> misc_f eature 
<222> (17947) . . . (18207) 
<223> megAI, ACPI 

<221> misc_feature 
<222> (18268) . . . (19548) 
<223> megAI, KS2 

<221> misc feature 
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<222> (19876) . . . (20910) 
<223> megAI, AT 2 

<221> misc_f eature 
<222> (21517) ..(22053) 
<223> megAI, KR2 

<221> misc_f eature 
<222> (22318) . . . (22575) 
<223> megAI, ACP2 

<221> CDS 

<222> (22867) . . . (33555) 

<223> megAII; SEQ ID NO: 14= translated amino acid sequence 

<221> misc_feature 
<222> (22957) . . . (24237) 
<223> megAII, KS3 

<221> misc_f eature 
<222> (24544) . . . (25581) 
<223> megAII, AT3 

<221> misc_feature 
<222> (26230) . . . (26733) 
<223> megAII, KR3 (inactive) 

<221> misc_feature 
<222> (26998) . . . (27258) 
<223> megAII, ACP3 

<221> misc_'f eature 
<222> (27393) . . . (28590) 
<223> megAII, KS4 

<221> mi sc_f eature 
<222> (28897) . . . (29931) 
<223> megAII, AT 4 

<221> misc_feature 
<222> (29953) . - . (30477) 
<223> megAII, DH4 

<221> misc_feature 
<222> (31396) . . . (32244) 
<223> megAII, ER4 

<221> raisc_feature 
<222> (32257) . . . (32799) 
<223> megAII, KR4 

<221> misc_feature 
<222> (33052) . . . (33312) 
<223> megAII, ACP4 

<221> CDS 

<222> (33666) . . . (43271) 

<223> megAI II; SEQ ID NO: 15= translated amino acid sequence 

<221> mi sc_f eature 
<222> (33780) . . . (35027) 
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<223> megAIII, KS5 

<221> misc__f eature 
<222> (35385) . . . (36419) 
<223> megAIII, ATS 

<221> misc_f eature 
<222> (37068) . . . (37604) 
<223> megAIII, KR5 

<221> misc_feature 
<222> (37860) . - - (38120) 
<223> megAIII, ACP5 

<221> misc_feature 
<222> (38187) . . . (39470) 
<223> megAIII, KS6 

<221> misc_feature 
<222> (39795) . . . (40811) 
<223> megAIII, AT 6 

<221> misc_feature 
<222> (41406) . . . (41936) 
<223> megAIII, KR6 

<221> misc_feature 
<222> (42168) . . . (42425) 
<223> megAIII, ACP6 

<221> misc__f eature 
<222> (42585) . . . (43271) 
<223> megAIII, TE 

<221> CDS 

<222> (43268) . . . (44344) 

<223> megCII, TDP-4 -keto-6-deoxyglucose 3, 4-isomerase; 
SEQ ID NO: 16- translated amino acid sequence 

<221> CDS 

<222> (44355) . - . (45623) 

<223> megCIII, desosaminyl transferase, desosamine glycosyltransf erase ; 
SEQ ID NO: 17= translated amino acid sequence 

<221> CDS 

<222> (45620) . . . (46591) 

<223> megBII-2 (megBII) , TDP-4-keto-6-deoxy-L-glucose 2,3 dehydratase, 
TDP-4-keto-6-deoxyglucose 2,3 dehydratase; 
SEQ ID NO: 18= translated amino acid sequence 

<221> CDS 

<222> (46660) . . . (47403) 

<223> megH, TEII; SEQ ID NO: 19= translated amino acid sequence 
<221> CDS 

<222> (47411) . . . (47980) 

<223> megF, C-6 hydroxylase; SEQ ID NO: 20= translated amino acid sequence 
<400> 1 

ctcgagecga tgctcggcgg cgcggtgggc caaccagtcg tggacgtcgt cggtggcggt 60" 
gggaggtccg ccgtgccgag tcaggaaacg tattgccgat tgtgtggatt ccggagtcgc 120 
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atgaccgttg acccgatccc ccatacgcct ctcccgtgat gtcgtgggcg gtccgtgcgg 180 
taccgcccgg actgacattc gtcgatcaag accccgccca gtgtagggct ccgcccgcga 240 
cgggagaagg tccgtcgaac aacttccggg tgaccggtcg ccggcgtcgg tgaaacgggc 300 
gtcggagcac ccgatcattg ctgtcggtga acttcctaac tgtcggcgcg cacatctttc 360 
tgaccggtgt gttccgtggt atgacgcgtt cccggcccgt ctggaactgt gcgtgggact 4 20 

gaccggttgc ggcgtgtttt cgcccgtttc cgaactgcgg attcgtcgat cgcgcaggtg 480 
ggagcgggtg gctgaccggg atgatctgca atcatggcgc tcaatgacga tctcttgtag 540 
catggtccgc gccgagggtc cgacaggccc gaaacgcccg gcatccagcc tgttcgacga 600 
cgtcgacatc accgtgcaag ccgcgatgac accgacacca cgccatgctg gtgccgcact 660 
ggaagggtgg cgcgatcagg gaaatggccg tgtcactaga cagacgccaa acagctgtcc 720 
gggcctgcgg aaacagcatc gatctgcgtc agccgttcat tgccccggcg gcaccgcctt 780 
ggaaatccgt gccaccggtc gtccgcagtg acgatcgcgg acccgggttt cgagacagca 84 0 

ggtagtaggc gatgcaggcg tttcgtctcg cgccggacgc gtcgcactag gtggaatccg 900 
tcacagtctt caatccggga gcgttctatg gcagttggcg atcgaaggcg gctgggccgg 960 

gagttgcaga tggcccgggg tctctactgg gggttcggtg ccaacggcga tctgtactcg 1020 

atgctcctgt ccggacggga cgacgacccc tggacctggt acgaacggtt gcgggccgcc 1080 

ggacggggac cgtacgccag tcgggccgga acgtgggtgg tcggtgacca ccggaccgcc 1140 

gccgaggtgc tcgccgatcc gggcttcacc cacggcccgc ccgacgctgc ccggtggatg 1200 

caggtggccc actgcccggc ggcctcctgg gccggcccct tccgggagtt ctacgcccgc 12 60 

accgaggacg cggcgtcggt gacagtggac gccgactggc tccagcagcg gtgcgccagg 1320 

ctggtgaccg agctggggtc gcgcttcgat ctcgtgaacg acttcgcccg ggaggtcccg 1380 

gtgctggcgc tcggtaccgc gcccgcactc aagggcgtgg accccgaccg tctccggtcc 14 40 

tggacctcgg cgacccgggt atgcctggac gcccaggtca gcccgcaaca gctcgcggtg 1500 

accgaacagg cgctgaccgc cctcgacgag atcgacgcgg tcaccggcgg tcgggacgcc 1560 

gcggtgctgg tgggggtggt ggcggagctg gcggccaaca cggtgggcaa cgccgtcctg 1620 

gccgtcaccg agcttcccga actggcggca cgacttgccg acgacccgga gaccgcgacc 1680 

cgtgtggtga cggaggtgtc gcggacgagt cccggcgtcc acctggaacg ccgcaccgcc 1740 

gcgtcggacc gccgggtggg cggggtcgac gtcccgaccg gtggcgaggt gacagtggtc 1800 

gtcgccgcgg cgaaccgtga tcccgaggtc ttcaccgatc ccgaccggtt cgacgtggac 1860 

cgtggcggcg acgccgagat cctgtcgtcc cggcccggct cgccccgcac cgacctcgac 1920 

gccctggtgg ccaccctggc cacggcggcg ctgcgggccg ccgcgccggt gttgccccgg 1980 

ctgtcccgtt 'ccgggccggt gatcagacga cgtcggtcac ccgtcgcccg tggtctcagc 2040 

cgttgcccgg tcgagctgta gaggaagaac gatgcgcgtc gtgttttcat cgatggctgt 2100 

caacagccat ctgttcgggc tggtcccgct cgcaagcgcc ttccaggcgg ccggacacga 2160 

ggtacgggtc gtcgcctcgc cggccctgac cgacgacgtc accggtgccg gtctgaccgc 2220 

cgtgcccgtc ggtgacgacg tggaacttgt ggagtggcac gcccacgcgg gccaggacat 2280 

cgtcgagtac atgcggaccc tcgactgggt cgaccagagc cacaccacca tgtcctggga 2340 

cgacctcctg ggcatgcaga ccaccttcac cccgaccttc ttcgccctga tgagccccga 2400 

ctcgctcatc gacgggatgg tcgagttctg ccgctcctgg cgtcccgact ggatcgtctg 2460 

ggagccgctg accttcgccg ccccgatcgc ggcccgggtc accggaaccc cgcacgcccg 2520 

gatgctgtgg ggtccggacg tcgccacccg ggcccggcag agcttcctgc gactgctggc 2580 

ccaccaggag gtggagcacc gggaggatcc gctggccgag tggttcgact ggacgctgcg 2640 

gcgcttcggc gacgacccgc acctgagctt cgacgaggaa ctggtgctgg ggcagtggac 2700 

cgtggacccc atccccgagc cgctgcggat cgacaccggc gtccggacgg tgggcatgcg 27 60 

gtacgtcccc tacaacggcc cctcggtggt gcccgcctgg ctgttgcggg aacccgaacg 2820 

tcggcgggtc tgcctgaccc tcggcggttc cagccgggaa cacggcatcg ggcaggtctc 2880 

catcggcgag atgttggacg ccatcgccga catcgacgcc gagttcgtgg ccaccttcga 2940 

cgaccagcag ttggtcggcg tgggcagcgt tccggcaaac gtccgtaccg ccgggttcgt 3000 

gccgatgaac gtcctgctgc ccacctgcgc ggccaccgtg caccacggcg gcaccggcag 3060 

ttggctgacc gccgccatcc acggcgtacc gcagatcatc ctctcggacg ccgacaccga 3120 

ggtgcacgcc aagcagctcc aggacctcgg cgcggggctg tcgctcccgg tcgcggggat 3180 

gaccgccgag cacctgcgtg gggcgatcga gcgggttctc gacgagccgg cgtaccgcct 3240 

cggtgcggag cggatgcggg acgggatgcg gaccgacccg tcgccggccc aggtggtcgg 3300 

catctgtcag gacctggccg ccgaccgggc ggcacgcggc aggcagccgc gtcgaaccgc 3360 

cgagccgcac ctgccgcgat gacttccacc accaccggga ccggctgatg ccggtcccgg 3420 

aatccacacg ccgactttcc ttctgacacg agggggcccc ggtggttacc tccaccaact 3480 

tggacacgac agcacggccg gcactgaact cgttgaccgg gatgcggttc gtcgccgcct 3540 

tcctggtctt cttcacgcac gtcctgtcga ggctcatccc gaacagctac gtgtacgccg 3600 

acggcctgga cgccttctgg cagaccaccg gacgggtggg ggtgtcgttc ttctttattc 3660 

tcagcggttt cgtgctgacc tggtcggcgc gggccagcga ctcggtgtgg tcgttctggc -^720* 

gcagacgggt ctgcaagctc ttccccaacc acctggtcac cgccttcgcc gccgtggtgt 3780 
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tgttcctggt caccgggcag gcggtgagcg gtgaggcgct gatcccgaac ctcctgctga 3840 

tccacgcctg gttcccggcc ctggagatct ccttcggcat caacccggtg agctggtcgt 3900 

tggcctgcga ggcgttcttc tacctgtgct tcccgctgtt cctgttctgg atctccggta 3960 

tccgcccgga gcggctgtgg gcctgggccg ccgtggtgtt cgccgcgatc tgggcggtac 4020 

cggtggtcgc cgacctcctg ctgccgagtt ccccgccgct gatcccgggg cttgagtact 4080 

ccgccatcca ggactggttc ctctacacct tccctgcgac gcggagcctg gagttcatcc 4140 

tcgggatcat cctggcccgc atcctgatca ccggtcggtg gatcaacgtc gggctgctcc 4200 

ccgcggtgct gttgttcccg gtcttcttcg tcgcctcgct cttcctgccg ggtgtctacg 4260 

ccatctcctc gtcgatgatg atccttcccc tggttctgat catcgccagc ggcgcgacgg 4 320 

ccgacctcca gcagaagcgc accttcatgc gtaaccgggt gatggtgtgg ctcggcgacg 4380 

tctccttcgc gctctacatg gtccacttcc tggtgatcgt ctacggggcg gacctgctgg 4 440 

ggttcagcca gaccgaggac gccccgctgg gtctcgcact cttcatgatc attccgttcc 4500 

tcgcggtctc cctggtgctg tcgtggctgc tgtacaggtt cgtcgagcta cccgtcatgc 4560 

gtaactgggc ccgcccggcc tccgcccggc gcaaacccgc cacggaaccc gaacagaccc 4 620 

cttcccgccg gtaagaagga cggtgcatcg gtgaccacct acgtctggtc ctatctgttg 4 680 

gagtacgaga gggaacgagc cgacatcctc gatgcggtgc agaaggtctt cgccagtggc 4740 

agcctgatcc tcggtcagag tgtggagaac ttcgagaccg agtacgcccg ctaccacggg 4800 

atcgcgcact gcgtgggcgt cgacaacggc accaacgctg tgaaactcgc gctggagtcg 4860 

gtaggtgtcg gacgcgacga cgaggtcgtc acggtctcca acaccgccgc ccccacagtc 4 920 

ctggccatcg acgagatcgg cgcccggccg gtcttcgtgg acgtccgcga cgaggactac 4 980 

ctcatggaca ccgacctggt ggaggcggcg gtcaccccgc gtaccaiaggc catcgtcccg 5040 

gtgcacctgt acgggcagtg cgtggacatg acagccctgc gggaactggc cgaccggcgg 5100 

ggcctcaagc tcgtggagga ctgcgcccag gcccacggtg cccggcggga cggtcggctg 5160 

gccgggacga tgagcgacgc ggcggccttc tcgttctacc cgacgaaggt cctcggcgcc 5220 

tacggcgacg gcggcgcggt cgtcaccaac gacgacgaga cagcccgcgc cctgcgacgg 5280 

ctgcggtact acgggatgga ggaggtctac tacgtcaccc ggaccccggg tcacaacagc 5340 

cgcctcgacg aggtgcaggc cgagatcctg cggcgcaaac tgacccggct cgacgcgtac 5400 

gtcgcgggtc ggcgggcggt cgcccagcgg tacgtcgacg ggctcgccga cctccaagac 54 60 

tcgcacggcc tcgaactccc dgiggtcacc gacggcaacg aacacgtctt ctacgtgtac 5520 

gtcgtccgcc acccgcgccg cgacgagatc atcaagcgtc tccgggacgg gtacgacatc 5580 

tccctgaaca tcagctaccc ctggccggtg cacaccatga ccggcttcgc ccacctcggt 5640 

gtcgcgtcgg 'ggtcgctgcc ggtcaccgaa cggctggccg gcgagatctt ctcccttccc 5700 

atgtacccct ccctccctca cg^cctgcag gacagggtga tcgaggcggt gcgggaggtc 5760 

atcaccgggc tgtgacgagc ccgcgtgtcg tcagcgaaga cccactctgg aagggccggt 5820 

catgccgaac agccactcga ccacgtcgag caccgacgtc gccccgtacg agcgggcgga 5880 

catctaccac gacttctacc acggccgtgg caagggatac cgtgccgaag ccgacgcgct 5940 

cgtggaggtc gcccccaagc acaccccaca ggcggcgacc ctgctggacg tggcctgcgg 6000 

gaccggatcc cacctggtcg agctggcgga cagcttccgg gaggtggtgg gggtcgacct 6060 

gtcggccgcc atgctcgcca ccgccgcccg caacgacccc gggcgggaac tgcaccaggg 6120 

cgacatgcgc gacttctccc tcgaccgcag gttcgacgtc gtcacctgca tgttcagctc 6180 

caccggttac ctcgtcgacg aggccgaact ggaccgtgcc gtggcgaacc tggccggtca 6240 

cctcgcgcct ggcggcaccc tcgtcgtgga gccctggtgg ttcccggaga cgttccggcc 6300 

cggctgggtc ggggccgacc tggtcaccag cggtgaccgg aggatctccc ggatgtcgca 6360 

caccgtcccg gcgggtctgc ccgaccgcac cgcctcccgg atgaccatcc actacacggt 6420 

ggggtcaccg gaggccggga tcgagcactt caccgaggtg cacgtgatga ccctgttcgc 6480 

ccgcgccgcc tacgagcagg ccttccagcg ggcgggcctg agctgctcgt acgtcggcca 6540 

cgacctgttc tcgccgggcc ttttcgtcgg ggtcgccgcg gagccggggc ggtgagggtc 6600 

gaggagctgg gcatcgaggg ggtcttcacc ttcaccccgc agacgttcgc cgacgagcgg 6660 

ggggtgttcg gcacggcgta ccaggaggac gtgttcgtgg cggcgctcgg ccgcccgctg 6720 

ttcccggtgg cccaggtcag caccacccgg tcccggcggg gtgtggtccg gggggtgcac 6780 

ttcacgacga tgcccggctc catggcgaag tacgtctact gcgccagggg tagggcgatg 6840 

gacttcgccg tcgacatccg gcccggttcc ccgaccttcg gccgggccga gccggtcgag 6900 

ctctccgccg agtcgatggt cgggctgtac cttcccgtgg gcatgggcca cctgttcgtc 6960 

tccctggagg acgacaccac cctcgtctac ctgatgtccg ccggttacgt ccccgacaag 7020 

gaacgggcgg tgcaccccct ggatccggag ctggcgttgc cgatcccggc cgacctcgac 7080 

ctcgtcatgt ccgagcggga ccgggtcgca cccaccctcc gggaggcccg ggaccagggg 7140 

atcctgcccg actacgccgc ctgccgggcc gccgcgcacc gggtggtgcg gacgtgaccc 7200 

cggccgggcg tgcgggccgg tggtggtgct cggcgcgtcg ggtttcctgg gttcggcggt 7260 

cacccacgcc ctggccgacc tcccggtgcg ggtgcggctc gtcgcccggc gggaggtcgt 7320 

cgtgccctcc ggtgccgtcg ccgactacga gacgcaccgg gtggacctca ccgaacccgg --^7380- 

agcgctcgcg gaggtggtcg cggacgcccg ggcggtcttc ccgttcgccg cccagatcag 7 4 40 
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gggtacgtca gggtggcgga tcagcgagga cgacgtggtc gccgaacgga cgaacgtcgg 7500 

cctggtccgg gacctgatcg ccgtcctgtc ccgctcgccg cacgccccgg tggtggtctt 7560 

cccgggcagc aacacgcagg tcggcagggt caccgccggc cgggtcatcg acggcagcga 7620 

gcaggaccac cccgagggcg tctacgacag gcagaaacac accggggaac agctgctcaa 7680 

ggaggccact gcggccgggg cgatccgggc gaccagtctg cggctgcccc cggtgttcgg 7740 

ggtgcccgcc gccggcaccg ccgacgaccg gggggtggtc tccaccatga tccgtcgggc 7800 

cctgaccggc caaccgctga cgatgtggca cgacggcacc gtccggcgtg aactgctgta 7860 

cgtgaccgac gccgcccggg ccttcgtcac cgccctggac cacgccgacg cgctcgccgg 7920 

acgccacttc ctgttgggga cggggcgttc ctggccgctg ggcgaggtct tccaggcggt 7980 

ctcgcgcagc gtcgcccggc acaccggcga ggacccggtg ccggtggtct cggtgccgcc 8040 

tccggcgcac atggacccgt cggacctgcg cagcgtggag gtcgaccccg cccggttcac 8100 

ggctgtcacc gggtggcggg ccacggtcac gatggcggag gcggtcgacc ggacggtggc 8160 

ggcgttggcc ccccgccggg ccgccgcccc gtccgagccc tcctgaccgg ggtcacccgg 8220 

gttcgtccta cggcaccggc ccgtcgacgg ccggtgccgg gaagatcgct tcgagttccc 8280 

ggagttcctc ctcgcccagc gtcagctcgg cggcccgtaa cgccgagtcg agctgctcgg 834 0 

gtgtgcgggg gccgatgaca gcgcccagga tcccggggcg ggacaggacc caggccagac 8400 

cgacctcggc cgggtccgcg ccgaggcgtc ggcagtagtc ctcgtacgcc tcgacgaggg 84 60 

ggcgtacggc ggggaggagc acctgggcgc gtccctgcgc cgacttgacg gcggttccgg 8520 

ctgccaactt ctccagtacg ccgctgagca gcccgccgtg caggggggac caggcgaaca 8580 

cgcccacccc gtacgcctgg gcggcgggca ggacgtccag ctcggggtgg cggacggcca 8640 

ggttgtacag gcactggtgg gagatcatgc cgagcaggtt gcggcgtgcc gcgctctcct 8700 

gggcggcggc gatgtgccag cccgccaggt tggaggagcc gacgtacccg accttcccac 87 60 

" tgccgaccag atgttcggcg gcctgccaca cctcgtccca cggtgcggcg cggtcgatgt 8820 

ggtgcgtctg gtagatgtcg atgtggtcga ccccgaggcg gcggagggag ttctcgcagg 8880 

cggcgacgat gtgtcgcgcg gagagcccgc cgtcgttgac ccgttcgctc atctcgctgc 8940 

ccaccttggt cgccaggacg gtctcctcgc gtcgacctcc gccctgggcg aaccaccgtc 9000 

cgacgagttc ctcggtgtgg cccttgtaga gccgccagcc gtagatgtcg gcggtgtcga 9060 

tgcagttgac gccccgctcg agggcgtggt ccatcagccg cagcgcgtcg tcgtcggtca 9120 

cccgtccact gaagttcacg gtgccgagcc agagtcggct ggtgtgcaac gccgatcgtc 9180 

cgacgcgtac ccgggcggac ccggccccgg tggttcccac gtcggtcacc tgtcggcgcg 9240 

gtgctggtgg gcgagcgcct ccagcacggg tacgacctcg gcgggggtcg gcgcggccag 9300 

cgcctcctgc cgcagcttct cggcgttctc ggcgtgggaa cggtcctcga ccactgtggc 9360 

gagagcctgc cagaggotgt cggcgtcgac ctcgtccgga cggaggaaga cacccgctcc 9420 

cagctcggcg gtgcgctgac cacgcaggac acagtcccac tcgtgggcga cggagatctg 9480 

cggtacgccg tggtgcagcg cggtggccca gcttccggca ccgccgtggt ggatgacggc 9540 

ggcacagccc ggcagcagga tgttcatggg aacgaagtcc accaggcgga cgttgtccgg S600 

caccgacgcc ggatcgagcc cggagcgggt caccacgatc tcgccgtcga accgcgcgag 9660 

ggtggccagt gtccggagga actcctgcgg gttcgaggtg atgcccagcg ccgagtatcc 9720 

cccggtgaag cagacccggc ggactccgtc cgaggtcctg agccactgcg gcacgacgga 9780 

ggacccgttq tagggcaaag tccgggtgtg caccgactcc agtccggtct ccaggcggaa 9840 

gctctcgggc agctggtcga cgctccactg tccgacagcg aggtcctcgc tgtagtcgag 9900 

gccgaaccgg ccggcgacct cggtgagcca gccgccgagc gggtccggcc ggtcgtcggc 9960 

gggacgctgc ccgcgcaggt cctgggagcg gctgcggaag tagccggtga ggtcgctgcc 10020 

ccacagcagc cgggcgtggg cggccccgca ggccttggcc gcgaccgccc cggcgaaggt 10080 

gaagggctcc cagagcacca ggtcgggacg ccagtccatg gcgaactcga cgagttcgtc 10140 

gacgaaggag tcgttgttga ccaccgggaa gacgaaccgg gaggtggcct cctcgatgcc 10200 

gtgcaggaac tcccacgagc gcagttccgg tccgcgtcgg gcgaagtcca ggtcggtggt 10260 

gtagcggtgc acctgcgcgg cggcctcagg ggagatgtcg aagagtcggt ggtccgagcc 10320 

gagtggcacc gaggtcagtc ccgcgccgac gacgacgtcg gtgagctcgg gctgactggc 10380 

cacccggacg tcgtggccgg cggtgtgcag cgcccaggcc agggggacga ggccctggaa 104 40 

gtgggtacgg tgcgcgaacg aggtgagcag gacccgcact ggtcactcct tggtcgagat 10500 

gagggcggca acggtccggt cgatgccctc ggccagcggc acccgggggt gccagccggt 10560 

cagcgtccgg aactcggtgg agtcgaagtc gtcgctgcgg aagtcgttgg cctcggcgtt 10620 

ctccggtgga gggacgctga cgacgggcac cgcagggttg ccggtctgac gtgccacgct 10680 

ggcggcgacg gtctcgaaga tctcgccgag gggtcgggcc tcgtccgcgc tcggcgtcca 107 40 

gacgtcgccg accagcgcct cgtggttgtg cagtgcggcg gtgaacgcgg tggccacgtc 10800 

ctcgacgtgc aggaggttgc ggcgcacgct gccctcgtgc cacatcgtga tcggctcacc 108 60 

ggcgagggct cgccggatca tggcggtgac gacaccccgg ccggtctgcc ccgacgggcc 10920 

gctgtggccg tagatcgcgg gcaggcgcag gatcaccccg tcgacgaccc cgtcctcggt 10980 

ggcctgacgc aggatccgct cggcctcgat cttgtgctgg gcgtaccggc tgggggcggc 1-104 0- 

ggggttcgcg gcctgggtgg tgctggcgaa caggagcacc ggcgcgggtc cgggtcttgc 11100 
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ccgcagcgcg gcgacgaggt cgcgcatgat gcccgcgttg acgcgttcgg cctcgggcac 11160 

cgtggcggcg ctgcgccagg tcgacccgcc ggcggcgtag gcgaccagat gcacgacgac 11220 

gtcggtgtcg gcgacgacct gcgcgacccg gccgggttcg agcaggtcga ctcgaaggtg 11280 

ctcgatcccg gcgctgcctg gtggctggtc gcgagacccg gtgcgcgcga cggcccgcag 11340 

tcggagaggg tgtgtggtaa attcgcgaag aagggcgctt ccgacgaatc cagaaacgcc 11400 

gagaagtgtg acatgtcttg tcatctacta atgcattccg atagccaccg gcgcatggaa 114 60 

tccatttgtt ccccccaggg tggtgtcggg tgacaaatcc ggcctcaggt cggcctcaag 11520 

cctctttcga gcgggtgctg aggcttcccg cgtaccctcg gtggcctgcg ttcgggcggg 11580 

tgtcggggaa agggcggatc gaggagttcg gtagggcgtc gcggcgcgta ctccgggact 11640 

gatccgggtc gacgccccga cgcgtgacag ggcgtcgatc cgtgccgccc gtaccgccgg 11700 

ttttcggcga tggtcgcaga ttcctcccga cgtggtggac tcattggttc tcccgggtgt ' 11760 

ggccgcaccg tcggtggcct cgtcgggggt gtcggagacc gggtcgatcg ccgtccccgg 11820 

ccgtgccgac cagggtcggt ccgtcgccga ggtgggtcac cgtcgggtgg acccggtccg 11880 

ccggcggcca ccgcccgatc gtgcccacct tcgcctccgc gggtaaatgc ttcgtcgatc 1194 0 

tgatcgacac ttccggcgac gctatcaccg gagcattccc cggcaccacc ggtcgatgcc 12000 

tcgcgctttc caaacaggga aaacagcagc tcacagcggt tccaggcgcc gggcaatcct 12060 

agcgaagagt ctcgatgggg tcaaggtgaa ttctgtcaca gatgtttttg ttaaatgtac 12120 

tttcttcagc caccctcgac gttcatacaa ttggccggca tctctaccaa gggggagtga 12180 

gtggttgacg tgcccgatct actcggcacc cggactccgc acccagggcc gctcccattc 1224 0 

ccgtggcccc tgtgcggtca caacgaaccg gagctgcggg cccgcgcccg tcaattgcac 12300 

gcatatctcg aaggcatttc cgaggatgac gtggtggccg tcggcgccgc cctcgcgcgc 12360 

gagacacgcg cgcaggacgg gccgcaccgc gccgtcgtcg tggcctcctc ggtcaccgag 12420 

ctgaccgccg cgctcgccgc cctcgcccag ggccgcccac acccctcggt ggtacgcggt 124 80 

gtcgcccgac ccacggcacc ggtggtgttc gtcctgcccg gtcagggcgc ccagtggccc 12540 

ggcatggcga cccgactgct cgccgagtcg cccgtcttcg ccgcggcgat gcgggcctgc 12600 

gagcgggcct tcgacgaggt caccgactgg tcgttgaccg aggtcctgga ctcacccgag 12660 

cacctgcgcc gcgtcgaggt ggtccagccc gcgctcttcg cggtgcagac ctcactggcc 12720 

gccctgtggc ggtcgttcgg ggtgcgaccc gacgccgtac tcggacacag catcggtgag 12780 

ctggccgccg ccgaggtctg cggcgccgtc gacgtcgagg ccgccgcgcg ggccgccgcc 128 4 0 

ctgtggagcc gcgagatggt cccactggtg ggccggggtg acatggcggc ggtggcgctc 12900 

tccccggccg agctggcagc ccgggtcgag cggtgggacg acgacgtcgt gccggccggg 12960 

gtcaacggtc 'cccggtcggt gctgctcacc ggcgctcccg agcccatcgc acggcgggtc 13020 

gccgagctgg cggcacaggg cgtacgcgcc caggtcgtca acgtgtcgat ggcggcgcac 13080 

tcggcgcagg tcgacgccgt cgccgagggc atgcgctcgg cgctgacctg gttcgccccc 13140 

ggcgactccg acgtgcccta ctacgccggc ctcaccggcg ggcggctgga cacccgggaa 13200 

ctcggcgccg accactggcc gcgcagtttc cggctcccgg tgcgcttcga cgaggcgacc 13260 

cgtgcggtcc tggaactgca gcccggcacg ttcatcgagt cgagcccgca cccggtgctg 13320 

gcggcctccc tgcagcagac cctcgacgag gtcgggtccc cggccgcgat cgtgccgacc 13380 

ctgcaacgcg accagggcgg tctgcggcgg ttcctgctcg ccgtggcgca ggcgtacacc 134 40 

ggtggcgtga cagtcgactg gaccgccgcc taccccgggg tgacccccgg ccacctgccg 13500 

tcggccgtcg ccgtcgagac cgacgaggga ccctcgacgg agttcgactg ggccgcgccc 13560 

gaccacgtac tgcgcgcgcg gctgctggag atcgtcggcg ccgagacggc cgcgctcgcc 13620 

gggcgggagg tcgacgcccg ggccaccttc cgggaactgg gcctcgactc ggtcctcgcg 13680 

gtgcagctgc ggacccgcct cgccacggcg accgggcggg atctgcacat cgccatgctc 13740 

tacgaccacc cgaccccgca cgccctcacc gaggcgctgc tgcgcggccc gcaggaggag 13800 

ccggggcggg gtgaggagac ggcacacccg acggaggccg aacccgacga acccgtcgcc 13860 

gtggtcgcca tggcgtgccg gctgcccggc ggcgtcacct caccggagga gttctgggag 13920 

ctgctggccg aggggcggga cgccgtcggc gggctgccca ccgaccgggg atgggacctg 13980 

gactcgctgt tccacccgga cccgacccgg tcgggcacgg cgcaccagcg cgctggtggc 14040 

ttcctcaccg gcgccacctc cttcgacgct gccttcttcg ggctgtcgcc acgggaggca 14100 

ctggccgtcg agccgcagca gcggatcacg ttggagctgt cgtgggaggt gctggaacgc 14160 

gccgggatcc ccccgacgtc gttgcggacc tcccggaccg gggtgttcgt cggtctgatc 14 220 

ccccaggagt acggcccccg gctggccgag gggggtgagg gcgtcgaggg ctacctgatg 14280 

accgggacca ccaccagcgt cgcctccggt cgggtcgcct acaccctcgg cctggagggg 14 340 

ccggcgatca gcgtcgacac cgcctgctcg tcgtcgctcg tcgccgtgca cctggcgtgc 14 400 

cagtcgctgc ggcgcggcga gtcgacgatg gcgctcgccg gtggcgtgac ggtgatgccg 14 4 60 

acaccgggca tgctcgtgga cttcagtcgg atgaactccc tcgcccccga cggacggtcc 14 520 

aaggcgttct cggccgccgc cgacgggttc ggcatggccg aaggcgcagg gatgctcctg 14 580 

ctggaacggc tctcggacgc ccgccgccac ggccacccgg tgctcgccgt gatcaggggc 14 640 

accgct^tca actccgacgg cgcgagcaac ggactctccg ccccgaacgg ccgggcccag 1-^700' 

gtccgggtga tccgacaggc cctcgccgag tccgggctga cgccccacac cgtcgacgtc 147 60 
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gtggagaccc 
gacgcgtacg 
gggcacaccc 
gccggtgtcc 
tcgggcgcga 
cgggccgggg 
gcgccgccga 
ctctcggcga 
cgcgagcacc 
gcgctggcgt 
gacgaactcg 
cgcgtcgtct 
ctcgacggcg 
tacctggact 
cacacgctct 
tccctggcgg 
cagggggaga 
gcggtggccc 
atcgccgcct 
gtcaacggtc 
gcctcctgca 
tcctcgcacg 
ctgccgggct 
ctcgacgccg 
cgctccctcg 
accacggcga 
ctgcgacgtg 
gccggcgtcg 
ctgcccacgt 
gtcgccgact 
ccgggtgagc 
gacgaccggg 
ctggtggtgg 
ccggtggcgg 
ctggcggtga 
cgggagt gtc 
ctccgcgacc 
cccgccgtct 
cacctcggga 
acgtacgccc 
ggcacggtgc 
gcccgccagg 
gtcgaggagc 
gacgtcaccg 
ctgtcggcgg 
ggtgaccgca 
ctgacccggg 
ggcgcgccgg 
cagcgacgca 
gggatggccg 
cccgaccagg 
gtcgtcgaca 
ctcttcgaca 
gtggcggcgc 
cggacgcacg 
gccttcgccg 
actgcgaccg 
ctggccggac 
gaggccccga 
ctgccggggg 
accgcctcgg 



acggcaccgg 
gcggtgaccg 
aggccgccgc 
tgccccgcac 
tcagcctgct 
tgtcctcgtt 
ccggtgacga 
gcaccggcga 
ccgaccagga 
accgtagtgg 
ccgccggtgg 
tcgtcttccc 
acccggtctt 
tcgagatcgt 
ccaccgaccg 
cccggtggcg 
t tgccgcggc 
tgcgcagccg 
ccgtcgacga 
cgcgcgcggt 
ccgtcgaggg 
tcgaggccgt 
tcgtgccgtt 
ggtactggtt 
ccgaccaggg 
tcgaggagat 
gggccggcgg 
cagtggactg 
acccgtt cca 
ccgacgacgt 
cgggacggct 
tcgaggcggc 
agccccggac 
gcgtgctctg 
cgtcgttgtc 
cgatctgggt 
cggcccacgg 
ggggcggcct 
cgaccctgt c 
gccggtggtq 
tcgtcaccgg 
gcaccccgtg 
tactcaccga 
accgggagca 
tgttccacgt 
tcgaacgggc 
acgccgacct 
ggctcggcgg 
gcgagggact 
agggtccggt 
ccgtcgaggg 
tcaggtggga 
ccctcgacga 
tggccgggct 
cggctgccgt 
aactcggcgt 
gggtccggct 
acctggccgc 
cggtggcccc 
gagtggactc 
cggcacccgg 



cacccgcctc 
tgagcacccg 
cggtgtcgcc 
cctgcacgcc 
ccaggagccc 
cggcatcagc 
cacccgaccc 
ggcgttgcgc 
cctggacgac 
gttcgtgccc 
atccggggac 
cggccaggga 
cgcctcggtg 
cccgttcctg 
cgtcgacgtg 
ggcgtacggg 
gtgtgtggcc 
ggtcatcgcc 
ggtggcggcc 
ggtggtctcc 
ggtgcgggcc 
ccgtgacgcg 
ctactcgaca 
tcgcaacctg 
gtacacgacg 
cggtgaggac 
tcccgtcgac 
ggagtcggcg 
gcgtgagcgc 
ctcgtccctg 
cgacggcacc 
gcggcaggcg 
gggccgggtc 
cctgttcgct 
ggacacgctc 
ggt caccgag 
cgcgctctgg 
ggtcgacgtg 
cggcgccggc 
cagggcgggc 
cggcaccggc 
cctggtgctg 
actcgccgac 
gctccgtgcc 
cgccgcgacg 
caaccgggcg 
cgacgcgttc 
ctacgtcccg 
cccggccacc 
cgccgaccgg 
tctccgggtg 
ccggttcctc 
ggcccgtcgg 
gcccgtcggg 
cctcggccac 
cgactcgctg 
ggccacgacg 
cgaactgggc 
gaccgacgag 
accggagcag 
ggaccggagc 



ggtgatccga 
ctgcggatcg 
ggtctgatca 
gacgagccgt 
gctgcctggc 
ggcaccaacg 
gaccggatgg 
gcccgggcgg 
gtcgcctact 
gccgacgcgt 
gcggtgaccg 
tggcagtggg 
ctgcgggagt 
cgggccgagg 
gtccagccgg 
gtggaaccgg 
ggggcgctct 
accatgcccg 
cggatcgacg 
ggcgaccgtg 
aagcggctgc 
ctccacgccg 
gtcaccggcc 
cgccacaggg 
ttcctggagg 
cgtggcggtg 
ttcggctccg 
taccagggtg 
ttctggttgg 
cggtaccgca 
tggctgctgg 
ctggagtccg 
gacctggtgc 
gtcgcggagc 
gacctgaccc 
aacgccgt eg 
gccctcggtc 
ccgtcgggtt 
gaggaccagg 
gegggeggea 
ggggtcggtc 
gccagccgcc 
ctgggcaccc 
ctcctcgcga 
ctcgacgacg 
aaggtgctcg 
gtgetcttet 
ggcaacgcct 
tcggtggcgt 
ttccgccggc 
gcactggtgc 
ctcgcgtaca 
gccgcgcccg 
gaacgegaga 
gcctcggccg 
tcggccctgg 
aeggtctteg 
ggeggategg 
ccgatcgcca 
ctgtgggagt 
tgggatcegg 



tcgaggcacg 
geteggtcaa 
aactggtgtt 
caceggagat 
ccgccggcga 
cacacgcgat 
gcccggtggt 
cgcggctggc 
cgctggccac 
ccacggcgct 
gcaccgcccg 
eggggatgge 
gcgccgacgc 
cgcagcgccg 
tgctgttcgc 
cggccgtcat 
cgctggacga 
gcaacggcgc 
ggegggtega 
acgacctgga 
cggtggacta 
aacteggega 
gctgggtcga 
tccggttcgc 
tcagcgccca 
acctcgtcgc 
cgctggcccg 
ccggggcgcg 
aaccgaatcc 
tcgaatggca 
cgacgtaccc 
ccggggcgcg 
ggeggctega 
cggcggccga 
aggcggtggc 
ccgtcgggcc 
gggtegtege 
cggtcgccga 
tcgccctccg 
egggceggtg 
ggcacgtcgc 
ggggaccgga 
gggccaccgt 
ccgtcgacga 
gcaccgtcga 
gtgcccgcaa 
cctcctccac 
acctcgacgg 
ggggtacctg 
aeggggtcat 
agggtgaggt 
ccgcgcagcg 
gtcccgacgc 
aggeggtect 
ageaggtgee 
aactgegcaa 
accacccgga 
ggcgggagcg 
tegtegggat 
tgategtetc 
eggagttgat 



ggcgctctcc 
gtccaacatc 
ggcgatgcag 
cgactggtcc 
gcggccccgc 
catcgaggag 
gccctgggtg 
cgggcaccta 
cggtcgggcc 
gcggatcctc 
cgccccgcag 
agtcgacctg 
gttggaaccg 
gacccccgac 
ggtgatggtg 
cggacactcc 
cgcggcccgg 
gatggecteg 
gatcgccgcc 
ccgcctggtc 
cgcgtcgcac 
gttccggccg 
gcccgccgaa 
cgacgcggtc 
cccggtgctc 
tgtccactcg 
cgccttcgtg 
tcgggtgccg 
ggcccgcagg 
cccgaccgat 
cggtcgggcc 
ggtcgaggac 
cgccgtgggt 
acactccccg 
egggteggge 
ettcgaaegg 
cctggagaac 
gctgtcgcgt 
acccgacggg 
gcagccccgg 
ccggtggctg 
cgccgacggg 
caccgcctgc 
cgagcacccg 
gaccctcacc 
cctgcacgag 
cgccgcgttc 
tctcgcccag 
ggegggcage 
ggagatgeae 
agccccgatc 
ccccacccgg 
cgggccgggg 
cgacctggta 
cgtcgacagg 
ccggctgacc 
cgtacggacc 
gcccgggggc 
ggcctgccgg 
egggegggae 
ggtctccgac 



14820 
14880 
14940 
15000 
15060 
15120 
15180 
15240 
15300 
15360 
15420 
15480 
15540 
15600 
15660 
15720 
15780 
15840 
15900 
15960 
16020 
16080 
16140 
16200 
16260 
16320 
16380 
16440 
16500 
16560 
16620 
16680 
16740 
16800 
168 60 
16920 
16980 
17040 
17100 
17160 
17220 
17280 
17340 
17400 
17460 
17520 
17580 
17640 
17700 
17760 
17820 
17880 
17940 
18000 
18060 
18120 
18180 
18240 
18300 
18-360 
18420 
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acgacgggca 
gcgttcttcg 
ctggagacca 
acggacaccg 
cccgaggacg 
cggatcgcgt 
tcgtcgcttg 
gcggtggcgg 
cagggcgcgt 
ggtctggggg 
gggcgtcggg 
gggttggcgg 
gcgggtgtgt 
ggggatccgg 
ggtccggtgg 
gtggtgggtg 
tgtcggggtg 

ggggtgcggg 

ggggtgtcgg 

gcggaacggc 

ccggtggtgc 

gaccacctgg 

gcccgccaac 

gaacggctgc 

tcgggtggtg 

cgggggttgt 

tcgtcggtgg 

ttggatcggg 

ttgtggcggt 

gcggcggcgg 

cgggcgcggg 

cgcgacgacg 

gcggtcaacg 

gtcgagcact 

cactccgcac 

ggccgcccgg 

gaactggacg 

gtcgaggcgc 

ctgtcgatgg 

ctggaacgcg 

cacggcgtac 

acctatccct 

gtcgccgact 

ctcgacggtc 

gaggtgcggg 

gtcaccgacc 

ggtgcggccg 

ctgtgggtgg 

gcgacggtgt 

ctgctggatc 

gccggtgccg 

cccaccccgg 

gggggcaccg 

cacctcgccc 

gacctgaccg 

tcggtcggcg 

cacgctgccg 

gacgtggtgg 

gaactgttcc 

tacgccgccg 

cccgccacct 



cccgtaccgc 

ggatctcgcc 

cctgggaggc 

gtgtcttcgt 

aggtcgacgg 

acgtgttggg 

tggcgttgca 

gtggggtgtc 

tggctccgga 

aggggtcggc 

tgttgggtgt 

cgccgtcggg 

cgggtgggga 

tggagttggg 

tggtgggttc 

tgatcaaggt 

ggttgtcggg 

ggtggccggt 

ggacgaa tgc 

cggtggaggg 

tgtcggcaaa 

agacgcaccc 

gent cgacag 

gcggcctcgc 

gt gtgg~gtt 

tgtcggttcc 

tggggttttc 

tggatgrggt 

ggtgtggggt 

tggtggcggg 

cgttgcgggc 

'tacagaagct 

gccccgacgc 

gtgacggga t 

aggtcgagtc 

cgacggtgcc 

ccgactactg 

tggcagcgcg 

eggtegggga 

acaccgacga 

ccgtggactg 

tceagggacg 

ggttccaccg 

gctggctggt 

ccgccctcgc 

gggtcggtga 

agaccctggc 

t caccgtggg 

gggggttggc 

tgccgcagac 

aggaccaggt 

tcaccggagc 

ccggtctggg 

tggtcagccg 

ggctcggcgt 

ccctggtgca 

gtctgcccca 

ccgtgaaggt 

tgctgttctc 

gaaaegcett 

cggtggcgtg 



cttcggcaac 

gcgtgaggcg 

gctggagaac 

gggcatgtcc 

ctacctgttg . 

gttggagggg 

cgtggcggcg 

ggtgatggcc 

cggcaggtgc 

ettegtegtg 

ggtggtgggt 

ggtggcgcag 

tgtgggtgtg 

ggcgttgttg 

ggtgaaggcg 

ggtgttgggg 

gttggtggat 

gggtgtggat 

tcatgtggtg 

gtegtcgegg 

gaccgaaacc 

cgacgtcccg 

gcgcgcggtc 

egggggegaa 

tgtttttcct 

ggtgtttgtg 

ggtgttgggg 

geagceggtg 

tgtgcctgcg 

ggtgttgtcg 

gt tggcegge 

cctcgacagc 

ggtggtggtc 

eggggt cegg 

getcegggag 

gt t ctactcc 

gtaccgcaac 

tgacct cacc 

gacgcttgcc 

cgtcgagcgc 

ggcggcggtc 

gcggttctgg 

ggtcgactgg 

ggtcgtaccc 

cgccggtggt 

cagcgacgcg 

getgetgega 

ggccgtcgcc 

cct tgtcgee 

accggacccg 

ageggt cege 

cgggccgtac 

tgccgtcacc 

gcgcgggccg 

acgggtgtcg 

ggagttgaca 

geaggtgeca 

cgacggcgcg 

ctccggggcc 

cctggacgcc 

ggggctctgg 



ttcatgcccg 

ttggcgatgg 

gccggtatcc 

catcaggggt 

acaggcaaca 

ccggcgatca 

ggttcgttgc 

ggtceggagg 

aagcccttct 

ttgcagcggt 

tcggcggtga 

cagcgggtga 

gtggaggcgc 

gggacgtatg 

aatgtgggtc 

ttgggtcggg 

tggtegtegg 

ggggtgcgtc 

gtggcggagg 

gggttggtgg 

gccctgcacg 

atgaccgacg 

ctcctcgccg 

ccggggaccg 

ggtcagggtg 

gagtcggtgg 

gtgttggagg 

ttgttcgtgg 

gcggtggtgg 

gtgggtgatg 

cacggcggca 

ggcccctgga 

tccggcgacc 

gcccggacga 

gagctgetet 

accctcaccg 

ctgcgccacc 

aegt t egteg 

gacgtggagt 

ttcctcacct 

ctcggctccg 

ctgcaccccg 

aeggegaegg 

gaggggtaca 

gccgagccgg 

gtggtgtcga 

cgactcgacg 

cccgccggtc 

tccctggaac 

cagctacgac 

gccgacgccg 

accgccccgg 

gcccgatggc 

ggcaccgccg 

gtgcactcct 

geagceggtg 

ctgaccgaca 

gtgcacctgg 

ggggtgtggg 

ttcgcccgac 

gcggccgggg 



gggegggega 

atccgcagca 

ggcccgagtc 

acgccaccgg 

ccgcgagcgt 

ctgtggacac 

gttctgggga 

tgttcaggga 

eggacgagge 

tgtcggtggc 

atcaggatgg 

tteggeggge 

atgggacggg 

gggtgggtcg 

atgtgcaggc 

ggttggtggg 

gtgggttggt 

ggggtggggt 

cgccggggtc 

gggtggttgg 

cccaggcacg 

tggtgtggac 

ccgaccggac 

gtgtggtgtc 

gtcagtgggt 

tggagtgtga 

gtcggtcggg 

tgatggtgtc 

gtcattcgea 

gtgcgcgggt 

tggect eggt 

eggggaaget 

cccgagccgt 

tccccgtcga 

ccgtcctggc 

gtgggt tegt 

cggtgcggtt 

aggtcagccc 

ccgccgtcac 

ccctcgccga 

gaaccctggt 

accgtggtcc 

ccaccgacgg 

eggacgaegg 

tggtgacgac 

tgeteggget 

cacaggcgtc 

cggtgcagcg 

gcggacaccg 

cccggctggt 

tacacgcccg 

gegggacgat 

tcgccgagcg 

gegtcgaega 

gegaegtegg 

acgtggtccg 

tggacccggc 

ccgacctgtg 

gcagtgcccg 

aceggeggga 

ggatgacagg 



gttcgacgcg 

gcggcacgcc 

gttgcgcggt 

ccgcccgaag 

cgcctccggt 

ggcgtgttcg 

ctgtggtctg 

gttctcccgg 

cgacggcttc 

ggtgcgggag 

ggcgagtaat 

gtggggtcgt 

gacgcggttg 

gggtggggtg 

ggcggcgggt 

tccgatggtg 

ggtggcggat 

gtcggcgttt 

ggtggtgggg 

tggtgtggtg 

tcgactcgcc 

getgaegcag 

ccaggccgtg 

gggggtggcg 

ggggatggcg 

tgcggtggtg 

tgcgccgtcg 

gttggcgcgg 

gggggagatc 

ggtggcgttg 

acgccgaggc 

ggagatcgee 

gaccgagctg 

ctacgcctcc 

egggatcgag 

cgacggcacc 

ccacgccgcc 

gcaccccgtg 

tgtgggcacc 

ggcgcacgtc 

cgacctgccc 

gcgtgacgat 

gtcggcccga 

ctgggtcgtg 

ggtcgaggag 

ggccgacgac 

caccacccca 

ccccgaacag 

gtggaccggc 

cgaggcgctc 

teggategtc 

cctcgtcacc 

cggtgccgaa 

ggtggtccgg 

cgaccgcgag 

gggggtggtc 

cgacctcgcc 

cccggaggcc 

tcagggtgcg 

ccggggtctg 

ggaccaggag 



18480 

18540 

18600 

18660 

18720 

18780 

18840 

18900 

18960 

19020 

19080 

19140 

19200 

19260 

19320 

19380 

19440 

19500 

19560 

19620 

19680 

19740 

19800 

19860 

19920 

19980 

20040 

20100 

20160 

20220 

20280 

20340 

20400 

20460 

20520 

20580 

20640 

20700 

20760 

2082O 

20880 

20940 

21000 

21060 

21120 

21180 

21240 

21300 

21360 

21420 

21480 

21540 

21600 

21660 

21720 

21780 

21840 

21900 

21960 

25020 

22*080 
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gcggtgtcgt 
gcgctggaac 
gcggccttcg 
acacctgcgg 
ctggcggccc 
gccgcagccg 
ctcgggttcg 
ctgcgtctgc 
ctccacgacc 
ctggccgcgc 
gagcgcctgg 
ccgaccgccg 
cgggaactcg 
ggacctgtga 
cgggccgccc 
gcctgccgcc 
gggcacgaga 
cacccggacc 
gtggcgggct 
ccgcaacagc 
ccgcactccc 
ggcgagaacg 
gctgtcgcct 
gacaccgcgt 
ggcgagtcga 
gtcgacttca 
gccgccgacg 
gaggccgaaa 
gacggggcca 
caggcgctac 
accggcacca 
gaccgggatc 
caggcggcgg 
ctgcccgcca 
gtacgcctgg 
gtgtcggcgt 
cggaccaccg 
cggtcggcgg 
gacgtcgggc 
cgggcggcgg 
gcggtcgaac 
gtcgtcttcc 
gactcggcac 
caggactggt 
gtcgacgtgg 
tcgtacgggg 
cacgtggcgg 
ttgctgcggt 
gtacgccgcc 
cggtcggtgg 
gccgagggcg 
gacagggtcc 
atcaccttct 
tactggtacc 
gactcgggat 
gccgaggcgg 
ggcgacggcg 
gacgtcgact 
ttccaacgga 
gcctaccggg 
tggctggtgg 



tcctgcgtga 
gggtcctcac 
ccgagtcgta 
cggcggtcgg 
tgccccgggc 
tgctcggcag 
actcgctggc 
cggccaccct 
gactcggcga 
tggagcaggc 
aacggatgct 
gtgacgacct 
acgccaggtg 
ctgacaacga 
gcaagcgcct 
taccgggcgg 
cggtgtccac 
ccgaccaccc 
tcgacgccga 
ggctgctgtt 
tgcgtggcac 
gcaccgaagc 
ccgggcggat 
gctcgtcgtc 
gtctcgctgt 
gccgccagcg 
ggttcggctt 
gcaacggcca 
gcaacggtct 
gaaactgcgg 
cgctcggcga 
cggaccaccc 
cgggcgtcac 
ccctgcacgt 
cgacccgggg 
tcggcatcag 
agcgcaccgt 
cggcgctacg 
tggcggaggt 
tggtggcgtc 
cgcgcggcga 
tcttcccggg 
cggcgttcgc 
cggtctccga 
tgcagccggt 
tcacccccgc 
gtgcgctctc 
cgctgtccgg 
gactgcggtc 
tggtggccgg 
tacgggtccg 
gtgacgaact 
actcgacggt 
gcaacctgcg 
acgacgcgtt 
tcgaggaggc 
gaccgggggc 
ggacgcccgc 
agccgtactg 
tgtcctggac 
tgcaccccgg 



gcggggcgta 
cgccggggag 
cacctccgcc 
cgagcgcgac 
cgagcggtcg 
cgacgcgaag 
cgcggtccgg 
ggtcttcgag 
ggccggcgag 
cctgcccgac 
cgccgggctc 
gggggaggcc 
aacccgaact 
caaggtggcg 
gcgcgagctg 
ggtgcacctc 
cttccccacc 
cggcaccagc 
gttct tcggg 
ggagaccagt 
cccgaccggc 
cggtgacgcc 
ctcctacgcc 
gttggtggcg 
cgtcggcggg 
ggcgttggcc 
ctccgagggg 
cgaggtgttg 
cgccgcgccg 
cctgaccccg 
cccgatcgag 
gctgtggctg 
cgggctgctc 
cgacgagccc 
ccggccgtgg 
cgggaccaac 
cggcggcgac 
ggcccaggcg 
cgggcggagc 
gacccgggcc 
ggacaccgtc 
acaggggtcc 
cgacacga t c 
cgtgct ccgg 
gctgttcgcg 
tgcggtggtg 
cctcgccgac 
gggcggcggc 
gtgggaggac 
ggaaccggag 
cgagatcgac 
cctgacggtc 
cgacgtccgt 
ggagacggtc 
cgtcgaggtc 
aggtgtcgag 
gttcctgcgg 
cctcccggga 
gctgcggtcg 
gccgatcacc 
gggcagcacc 



cggccgatgt 
accgcggtgg 
cggccccggc 
gagccgcgtg 
gcggagctgg 
gccgtacccg 
ttccgtaacc 
cacccgaacg 
ccgacccccg 
gcctccgaca 
cgccccgagg 
ggcgtcgacg 
gaccgcagcc 
gagtacctcc 
caatccgacc 
ccgcagcacc 
gggcgcggct 
tacgtcgacc 
atctccccgc 
tgggagctgg 
gtcttcctcg 
gagggctatt 
ctcgggctgg 
ctgcacctgg 
gcggcggtca 
gctgacggca 
gtctccctcg 
gctgtcatcc 
aacgggaccg 
gccgacgtgg 
qccaaegccc 
gggtcggtga 
aagatggtgc 
accccgcacg 
cggcggggtg 
gcccacgtga 
gtcggcccgg 
gcccaggtcg 
ctggccgtga 
gaggcggtgc 
accggggtcg 
cagtgggtcg 
cgcgcctgcg 
caggagccgg 
gtgatggtgt 
gggcactcgc 
gcggcgaggc 
atgagcgccg 
cggatctccg 
gcgctgcggg 
gtcgactacg 
acgggggaga 
gctgtcgacg 
cggttcgccg 
agcccgcatc 
gacgccgtcg 
tcggcggcca 
gctgcgacga 
tctgctcccg 
ccgcccgggg 
ggatgggtcg 



cggtgccgag 
tcgtcgccga 
cgctgctcca 
agcagaccct 
tacgcctggt 
ccaccacgcc 
ggctggccgc 
ccgcagccgt 
tccggtcggt 
cggagcgggt 
ccggagccgg 
aactcctcga 
gcagccgaag 
gtcgtgcgac 
cgatcgcggt 
tgtgggacct 
gggacctggc 
ggggtgggtt 
gcgaggccac 
tggagagcgc 
gcgtggcgcg 
cggtgaccgg 
agggtccgtc 
cggtcgagtc 
tggcgacacc 
ggtcgaaggc 
tcctgctcga 
gtggctccgc 
cccagcgcaa 
acgccgtgga 
tgctggacac 
agtcgaacat 
tggcactgcg 
tggactggtc 
accggccgag 
tcgtcgagga 
tcccgctcgt 
ccgagctggt 
cccgggcgcg 

gggggctgcg 

ccgagacgtc 
ggatgggcgc 
acgaggcgat 
gggcaccggg 
cgttggcgcg 
agggggagat 
tggtggtggg 
tcgcgctcgg 
tggccgccgt 
agtggggacg 
cctcgcactc 
tcgagccccg 
gcaccgacct 
acgcgatgac 
cggtggtggt 
tcgtcggcac 
ccgcccactg 
tcccgttgcc 
cccccgcctc 
acggcgtact 
acgggttggc 



ggcactggaa 

cgtcgactgg 

ccggctcgtc 

ccgggaccgg 

ccggcgggac 

gttcaaggac 

ccacaccggt 

cgccgacctc 

gggcgccgga 

cgagctggtc 

ggccgacgcc 

cgcgctcgaa 

cagagaccga 

gctcgacctg 

cgtcggcatg 

cctgcgccag 

cgggctcttc 

cctcgacgac 

ggccatggac 

cggcatcgat 

gctcggctac 

ggtggcaccc 

gatcagcgtg 

gctgcggctg 

aggggtgttc 

cttcggggcc 

acggctctcc 

cctcaaccag 

ggtgatccgg 

ggcgcacggc 

ctacggccgt 

cggccacacg 

ccacgaggaa 

ctcgggagcg 

gcgggccggg 

ggcacccgag 

ggtgtccgcc 

ggagggctcc 

acacgagcac 

cgaggtcgcg 

cgggcgcacc 

ggagctgctg 

ggcaccgttg 

actggaccgg 

gttgtggcag 

cgccgccgcc 

ccgcagccgg 

tgaggccgag 

caacggaccc 

ggagcgggag 

gccgcagatc 

gtcggcggag 

ggacgcgggg 

ccggttggcc 

gtcggcggtc 

cctgtcccgg 

cgccggtgtg 

gacgtacccg 

ccacgatctc 

cgacggcgac 

ggcggcgatc 



22140 
22200 
22260 
22320 
22380 
22440 
22500 
22560 
22620 
22680 
22740 
22800 
22860 
22920 
22980 
23040 
23100 
23160 
23220 
23280 
23340 
23400 
23460 
23520 
23580 
23640 
23700 
23760 
23820 
23880 
23940 
24000 
24060 
24120 
24180 
24240 
24300 
24360 
24420 
24480 
24540 
24600 
24660 
24720 
24780 
24840 
24900 
24960 
25020 
25080 
25140 
25200 
25260 
25320 
25380 
25440 
25500 
25560 
25620 
2S680 
25740 
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accgccggcg 

ctggccgagg 

accgacgaac 

ggtgacgccg 

gtcgacggtg 

cggctggagc 

gccgggacgc 

cgtggcgacc 

gggttcaccc 

ctggcccggt 

ggcgaggagt 

gaggcggagg 

gagacgttga 

gtcgcggcga 

gaacgggagg 

tacgccgccg 

gccagcgcct 

ctgcgcgagc 

ctgctccgcg 

gagggtttcg 

gaccccgacg 

atcgcggcgc 

gtcgcggagg 

gaactcggcc 

ggcctgcgga 

tacctgcgtc 

accgacgagg 

gccacccccg 

cccaccgacc 

accagct acg 

ttcgggatca 

atcgcgtggg 

accggcgtct 

gaccggctca 

gcctacacct 

ctggtcgcca 

gccggcgggg 

gggctcgccg 

gccgagggcg 

caggtgct gg 

gccgccccga 

ctgcgtcccg 

ccgatcgagg 

ctgggctcgg 

atcaaggcgg 

ttgtccccgc 

tggccccccg 

aacgcccacg 

gccccgggcg 

caggcgcgga 

gcccgtaccc 

gaccgggagg 

gtcgtcgccc 

tcgcagtggg 

atgggccggt 

cgtggggtcg 

gcggtgatgg 

gtgggtcact 

gacgccgcca 

ggcatgg-tgt 

gggcgggtcg 



gtggccgggt 

cgctcgcccg 

ggcacgtcga 

gaatcgacgc 

acctggcccg 

tggcccgccg 

gtctggtcgc 

gtctctacgg 

cgcacggcac 

ggctcgccga 

tgctgaccgc 

cactgcgtac 

cgaacttcgc 

agaccgcgct 

tctactgctc 

gcagcgccta 

cggtggcctg 

gcggcctgcg 

ccggtgcggt 

cggccatccg 

gcgcgcccgt 

tgtccccgca 

tgctgggaca 

tcgactcgct 

tgccggcctc 

gactggtcgt 

ccgaacccgt 

aggacctctg 

ggggctggga 

tcgacagggg 

ccccccgcga 

aggcggtgga 

tcgtcggcat 

acggctacca 

tcgggtggga 

tccacctcgc 

tgacggtcat 

ccgacgggcg 

tcgcggcgct 

cggtgctgcg 

acgggccgtc 

ccgacgtcga 

ccggggcgct 

tgaagacgaa 

tcctggcgat 

acatcgactg 

gtgagcgccc 

tcatcgtcga 

ggcccctgcc 

ccctcgccga 

tggccaccgg 

gtgtctgcgc 

cggcggtctt 

tcggcatggc 

gcgccgaggc 

gcgaccccga 

tgtcgctggc 

cgcaggggga 

gggtggtggc 

cggtcggcac 

cggtggcggc 



cgtcgcccac 

gcgggacggc 

ggccggtgcg 

accactgtgg 

accggcgcag 

cttcggtggg 

ggcggtcctc 

ccgtcgcctg 

cgtcctggtc 

acggggtgcc 

gatccgggcc 

ggcgatcggc 

cggcgtcgcc 

gccgacggtc 

gtcggtggcc 

cctcgacgcc 

gaccccgtgg 

cagcctcgac 

gtcggtggcc 

gccgaccccg 

cgaccggccg 

ggaacagcgg 

cgagaccggc 

gggctcgatg 

gctggtcttc 

cggggactcc 

cgccgtggtc 

gcgggtggtg 

cctccggcgg 

gggattcctc 

ggcgctggcg 

acgggcgggc 

gaacggccag 

ggggttgggc 

ggggccggcg 

catgcagtcg 

ggccgacccg 

gtgcaaggcg 

cgtcctcgaa 

cggcagcgcc 

gcaggaacgg 

catggtggag 

catcgcggcg 

catcggccac 

gcggcacggc 

ggcggacggg 

ccgccgcgcc 

ggaggcaccc 

cttcgtcctg 

acacctgcgc 

tcgcgcccgt 

cgccctcgac 

cgccgcccgt 

ccgtgacctg 

gctgtcgccg 

cccgtacgac 

gcggttgtgg 

gatcgccgcc 

gttgcgcagc 

ctcccgcgcc 

ggtgaacgga 



ccggtggact 

acgttccggg 

gtcgccctgc 

tgcctgaccc 

gccgccctgc 

gtgctcgacc 

gccggcggcg 

gtcagggcga 

accggcgcgg 

acccgactcg 

gccggtgcca 

ggggagttgc 

gacgccgacc 

ctggcggagg 

ggggtctggg 

ctggtcgagc 

gccctgcccg 

gtggccgacg 

gtcgccgacg 

ctcttcgacg 

ggggagccgg 

gagacgttgc 

accgagatca 

gccctgcgtc 

gaccacccga 

gacccgaccc 

ggcatcggct 

tccgagggca 

ctctaccacc 

gacggggccc 

atggacccgc 

atcgacccgg 

tcctacctgc 

aactcggcga 

ctgacggtgg 

ctgcgtcggg 

tacaccttcg 

ttctccgcgc 

ccgttgtcca 

gtcaaccagg 

gtgatcaggc 

gcgcacggga 

tacggccggg 

acccaggccg 

gtactcccga 

aaggtcgagg 

ggggtgtcct 

gccgaaccgg 

cacggacgca 

accaccggcc 

ttcgacgtcc 

gcgctggcgc 

acccccgtcc 

ctcgactcct 

tacaccgact 

cgggtggacg 

cagtcgtacg 

gcgcacgtgg 

cgggtgctgc 

gagttggact 

cccggcacgc 



ccgtgacctc 

gggtgctgtc 

tgaccctggc 

aggaggcggt 

acggtttcgc 

tgcccgccac 

gcgaggacgt 

ccctgccgcc 

ccggtccggt 

tcctgcccgg 

ccgccgtggt 

cgaccgcgct 

ccgaggactt 

tgctcggcga 

gtggggtcgg 

accgtcgcgc 

gcgcggtcga 

ccctcgggac 

tcgactggtc 

aactcctcga 

cgggcgagtg 

tgaccctcgt 

acacccgtcg 

agcgcctggc 

cggtcaccgc 

cggtacgggt 

gccggttccc 

cctccatcac 

ccgacccgga 

cggacttcga 

agcagcggct 

agaccctcct 

aactgctgac 

gcgtgctctc 

acaccgcctg 

gtgagtgctc 

tggacttcag 

aggccgacgg 

aggcgcggcg 

acggggccag 

aggccct gac 

cgggcaccga 

accgggaccg 

ccgccggtgc 

ggtcgctgca 

tgctccgcga 

ccttcggcgt 

accccgaacc 

gcgtccagac 

accgggacct 

gggccgcagt 

aggatcgccc 

tggtcttccc 

ccgaggtgtt 

gggacctgct 

tgctccagcc 

gggtgactcc 

ctggtgcgtt 

gggagctcga 

cggtcctgcg 

tcgtggtggc 



ccggaccggc 

gtgggtggcg 

gcaggcgttg 

ccgtaccccc 

ccaggtcgcc 

cgtcgacgcc 

cgtcgccgtc 

gcccggcggg 

gggcggtcgg 

cgcacacccg 

gtgcgaaccg 

cgtacacgcc 

cgccgccacc 

ccaccgcctc 

catggccgcg 

ccgggggcac 

cgacggtcgg 

gtgggaacgt 

ggtcttcaca 

ccggcgcggg 

gggtcgacga 

cggcgagacg 

ggccttcagc 

ggcccgtacc 

gctcgcgcgg 

gttcggcccc 

cggcggcatc 

caccggattc 

ccaccccggc 

ccccgggttc 

caccctggag 

cggcagcgac 

cggggagggt 

cggccgtgtc 

ctcgtcctcg 

gctggcgttg 

cgcacagcgg 

gttcgccctc 

aaacggccac 

caacggcctc 

cgcct ccggg 

actcggcgac 

gccgctctgg 

cgccggggtg 

cgccgacgag 

ggcacgacag 

cagcgggacc 

ggttcccgcc 

ggtccggtcc 

cgccgacacc 

gctcggcacc 

ctcgcccgac 

cgggcagggg 

cgccgagtcg 

cgacgtggtc 

ggtgctgttc 

gggtgcggtg 

gtcgttggcc 

cgaccagggc 

ccggtgggac 

cggacccacc 



25800 

25860 

25920 

25980 

26040 

26100 

26160 

26220 

26280 

26340 

26400 

26460 

26520 

26580 

26640 

26700 

26760 

26820 

26880 

26940 

27000 

27060 

27120 

27180 

27240 

27300 

27360 

27420 

27480 

27540 

27600 

27 660 

27720 

27780 

27840 

27900 

27960 

28020 

28080 

28140 

28200 

28260 

28320 

28380 

28440 

28500 

28560 

28620 

28680 

28740 

28800 

28860 

28920 

28980 

29040 

29100 

29160 

29220 

29280 

2-9340 

29400 
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gccgaactgg 

gcggtgcgct 

gaactcggca 

gacctcctcg 

gtgctgttcg 

gtcagcccgc 

ccggtcaccg 

aacctcctgg 

ggccgcctgg 

caccgcaggg 

gccgcagtcg 

gagcagcagt 

ctggtcgacc 

gtcctgcagc 

gccgccgacg 

ccggccgagg 

ggcggccggg 

cactacgaca 

gccgcgtggc 

gggtacgcgt 

cgcgcccccg 

actgcggtac 

gacccgaccg 

gatcgggacc 

gccaccccgg 

ctgctgcgcg 

gacgacccga 

tggctcgacg 

gaggtctccc 

cgctgcgccc 

cccccggcgg 

cggctgacgc 

cccggcaacg 

cccctggcgc 

gtcctgctcg 

gtggtcaccg 

ctgttccagg 

cccgacgggt 

tacgcgctgc 

gccggcgggg 

gccacggcca 

atcgcctcgt 

ggcgtcgacg 

ctcgccgacg 

ttccggggcc 

atcctggagg 

gtgtgggagt 

ggcaagctcg 

ggcgggaccg 

ccccacctcc 

gccgacgtcg 

gaggcgctcg 

cacaccgccg 

caggtcctgc 

gacctgagct 

ggcgtgtacg 

ggactgcccg 

ggcctcggtg 

gccctgttcg 

aggtcggcgc 

acgccacggg 



acgagttcct 
acgcgtcgca 
ccgtcaccgc 
acaccacagc 
agcacgccgt 
accctgtgct 
gcgtgccgac 
gggcgcacgt 
tcgacctgcc 
ccgacacctc 
acgtacccgg 
ggctgaccca 
tcgcgctcac 
agccgctggt 
aggacgggcg 
cccggtggtc 
acggcacaca 
ccctcgccga 
agcacggcga 
tcgacccggt 
ggaagctccc 
gggtggtggc 
gtcagctcgt 
agccgcgcgg 
acccgacccc 
ccggtggtcc 
cggccgaggc 
acgaccggtg 
ccggggacga 
aggcggagtc 
tgccggacaa 
'cgctcgccgg 
gcggctccat 
cggaggaggt 
cgctcggcat 
aggtcgggtc 
gggccttcgg 
ggcgggcggt 
acgacctggc 
tggggatggc 
gcccggccaa 
cccgggagag 
tggtcctgaa 
gcggggtctt 
ggtacgtccc 
aggtcgtcgg 
tgtcggcggc 
tcctcaccca 
gcaccctggg 
tggtggccag 
aaggcctcgg 
cggcgctgct 
gggtcctggc 
gggccaaggt 
tcttcgtgct 
cggcggccaa 
cgaaggcgct 
accggatcgc 
acgcggctct 
tgcgccgggc 
ccgccaacag 



cgcggtggcc 
ctccccggag 
cgtcggcggc 
catggacgcc 
ccgcagcctc 
gctgatggcg 
gctgcgccgc 
gcacggggtc 
cacctacccc 
gtcgctgggg 
tcacggcgga 
gcacgtggtg 
cgccggggcc 
gttgaccgcc 
gcggccggtc 
ggcgtacgcg 
gtggcccccg 
actgggctac 
cgtggtctac 
gctgctcgac 
cttcgcctgg 
gacccccgcc 
cgccacggtg 
ccgcgacggc 
ggcggcggtg 
ggcaccacag 
ccgtcacggg 
gcccgccacc 
cgtgccgcgc 
cccggaccgc 
tccgcagctc 
t cccgtgccg 
cgaggcagtg 
acgcgtcgcc 
gtacccggaa 
gggtgtccgg 
gccggtggcg 
ggacgccgca 
cgggttgcag 
tgccgtcgcg 
acacccgacg 
cgggttcggt 
ct cgctcacc 
cgtcgagatg 
gt tcgacctg 
tctgctggcc 
cccggccgcg 
gcccgccccc 
gcggctggtc 
ccggcgcggt 
cgcgaccatc 
cgactcgatc 
cgacgggctg 
cgacgcggcg 
gttctcgtcg 
cggggtcctc 
cgggtggggc 
ccgtaccggg 
gcgcagcggc 
cgagtacgtc 
ggccgagacc 



gaggcccgcg 
gtggcccggg 
acggtcccgc 
gggtactggt 
ctggagcggg 
gtcgaggaga 
gaccacgacg 
gacgtcgacc 
ttcgacaggc 
gtccgtgact 
gcggtgttca 
ggtgggcgga 
gacgtcggcg 
gccggtgcgt 
gagatccacg 
accgggaccc 
cccggcgcca 
gagtacgggc 
gcggaggtgt 
gccgtcgccc 
cggggcgtca 
ggaccggacg 
gacgccc'tgg 
gacctgcacc 
gtgcacgtgg 
gccgtcgtcg 
gtgctctggg 
accctggtgg 
cccggggccg 
ttcgtgctcg 
gcggtccgtg 
gccgtcgccg 
gccttcgccc 
gtccgcgcca 
ccggccgaga 
cggttcaccc 
gtcgccgacc 
gccgtaccca 
gccgggcagt 
ttggcccgtc 
ctgcgggcgc 
gagcggt teg 
ggcgacctgc 
ggcaagaccg 
gccgaggccg 
gccggtgccc 
ctcacccaca 
gtgcaccccg 
gcccgccacc 
ccggcggccc 
gagategteg 
cccgcggacc 
gtcacctcca 
tggcacctgc 
geggegtegg 
aacgccctgg 
ctgtgggcgc 
gtcgccgcgc 
ggggaggtgc 
cccgaggtgc 
ccgggccggg 



agatgaggee 
tegaacageg 
tctactccac 
accgcaacct 
gattcgagac 
ccgccgagga 
ggccgtcgga 
tgcgtccggc 
ageggctctg 
cgacccaccc 
ccgggcggct 
acctggtgcc 
tgccggtgct 
tgctgcgcct 
ccgccgagga 
tcgccgtcgg 
ccgccctgac 
cggcgttcca 
ccctcgacgc 
agaccttegg 
ccctgcacgc 
cggtggccct 
tegtcaggga 
gcctggagtg 
cggccgacgg 
tccgctaccg 
cggccacgct 
tggccacgtc 
ccgccgtgtg 
tegaeggega 
acggtgcggt 
acegggegta 
ccgtccccga 
ccggcgtgaa 
tgggcaccga 
ccggccaggc 
accggctcct 
tcgcgttcac 
ccgtgctggt 
gggcegggge 
t cggcctcga 
ccgcgcgtac 
tcgacgagtc 
acctgcggcc 
gtcccgatcg 
tegaceggtt 
tgagccgggg 
aeggaaeggt 
tggtgaccgg 
cgggcgcggc 
cctgcgacac 
gtccgctgac 
tegaegggae 
acgacctgac 
tgctggccgg 
ccgggcaacg 
aggecagega 
tgccgaccga 
tgttcccgct 
tgcgcggcgc 
gcctgctcga 



gegteggate 
gctcgccgcc 
cgccaccggg 
gcgccaaccg 
gttcatcgag 
cgccgagcgc 
gttcctccgc 
ggtcgcccac 
gcccaagccg 
gctgctgcac 
ctcccccgac 
cggcagtgtc 
ggaggaactc 
gteggtegge 
cgtctccgac 
cgtggccggc 
gttgaccgac 
ggcgctgcgc 
cgtcgaggag 
cctgaccagt 
caccggggcc 
gcgggtcacc 
cgccggggcg 
ggtacggctg 
gctcgacgac 
tcccgacggc 
cgtgcgccgt 
cgcaggggtc 
gggggtgctg 
cccggagacg 
gt tegtgeca 
ccggctggtg 
cgccgaccgg 
cttccgtgac 
ggcgtccggt 
ggtgacgggc 
caccccggtc 
caccgcccac 
ccacgccgcc 
ggaggtgttc 
cgacgaccac 
eggggggegg 
cgcgcggctg 
ggcggagcag 
geteggegag 
gccggtgtcg 
ccgacacgtg 
gctggtcacc 
geaeggegta 
cgagctgcgc 
cgccgaccgg 
cggggtggtg 
cgccaccgat 
ccgggacgcg 
tcccgggcag 
gcgggccctc 
gatgaccagc 
gcgggcgctg 
gtctgtcgac 
ggtccggtcc 
ccgtctcgtc 



29460 
29520 
29580 
29640 
29700 
2 97 60 
29820 
29880 
29940 
30000 
30060 
30120 
30180 
30240 
30300 
30360 
304 20 
30480 
30540 
30600 
30660 
30720 
30780 
30840 
30900 
30960 
31020 
31080 
31140 
31200 
31260 
31320 
31380 
31440 
31500 
31560 
31620 
31680 
31740 
31800 
31860 
31920 
31980 
32040 
32100 
32160 
32220 
32280 
32340 
32400 
32460 
32520 
32580 
32640 
32700 
32760 
32820 
32880 
32940 
33000 
33060 
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ggtgcacccg 
gcggtcgccg 
gggttcgact 
cggctgccca 
cggtcggagt 
ctggaacggg 
ctggaggcgc 
atcagtgacg 
ggaggggacg 
acaggtccac 
atccgatgag 
ccgtcgccga 
aaccgatcgc 
cgttctggga 
gctggccgcc 
acgccgcctt 
tgatgctgga 
gcggcagcgc 
acgaggcacc 
ccggacgggt 
gctcctccgg 
ccctggtcct 
gcagccaggg 
gcttcgggct 
ccgagggccg 
gcaacgggct 
agcgggcgcg 
ggctgggcga 
ccggccgccc 
cgggggtggc 

cgt tgcactt 

tgtccgagac 

tcggcatcag 

ccgacctcga 

ccaccgccga 

ccctgcgcgc 

tgcgcgacac 

tcgt cggcgg 

tcgacggagc 

ggcagggcgc 

cggagtccat 

aggtgctcga 

cggtgatggt 

tgggtcactc 

acgccgccag 

ggatggcgtc 

gtgcgctgac 

gcccgttgga 

ccgtcgacta 

cactggccgg 

aggtcatcga 

tgcgcttcca 

tcagcccgca 

ccgacgcgga 

tccacaccgc 

tgggtgaggg 

tcccggtccc 

ggcaccccgt 

cggcagtacc 

ccgtcgtgtt 

acggcaccgc 



agaccgatca 

gctacgactc 

cgctggcggc 

gcacgctggt 

tgttcgccga 

cgctcgacgc 

tgctgcgccg 

acgccagtga 

tctaggtgac 

cgggttcgcg 

cgagagcagc 

actcgactcg 

cgtcgtcggc 

gttcatccgc 

ggcaccgcga 

cttcggcatc 

gatctcctgg 

cggtggcgtc 

cgaggaggtg 

ggcgtacacc 

gctcaccgcg 

cgccggtggg 

cgggttggcc 

cgccgagggg 

gccggtgctg 

caccgcgccg 

gctgcgtccc 

tccgatcgag 

gctctgggtc 

cggggtgatg 

cgacgagccc 

ccggccctgg 

cggcaccaac 

cccgaccccc 

gccgggtgcg 

ccaggcggcc 

cgccttcacc 

gggcgaggag 

cgtcagcggg 

acagtggcag 

cgacgcctgc 

cggcgagcag 

gtcgttggcg 

gcagggggag 

ggtggtggcg 

gttcgggctc 

tgtcgcctcg 

cgagctgatc 

cgcctcacac 

ggtccgtccg 

aacggcgacg 

ggacgccacc 

cccggtgttg 

tccgtgtgtc 

gctcgccgag 

acgcccggtc 

cctgggccgg 

cgacctcggg 

cccggcctgg 

gtgcaccgcg 

cctgtccact 



ggtggccgcg 

ggccgaccag 

ggtggagctg 

gttcgaccac 

ctccgcgccg 

cctgcccgac 

gtggcagagc 

cgacgagctg 

aggtcgattc 

tcgcctccca 

ggcatgaccg 

gtgacaggtc 

atggcctgcc 

gacggtggtg 

ccccgcctcg 

tcaccccgcg 

gaggcgttgg 

ttcaccggtg 

ctcggctacg 

ctggggttgg 

gtgcacctgg 

gtcaccgtga 

gaggacggcc 

gccggggtcc 

gccgtactgc 

agcggccccg 

gtcgacgtgg 

gcgcacgccc 

ggatcggtga 

aagaccg tgc 

tcgccgcacg 

ccggtggggg 

gcgcacgtca 

ggcccggcaa 

gaggcggtcg 

cggctcgccg 

ctggtcaccc 

gtcctcgccg 

cgggcgcgcg 

ggcatggccc 

gagcgggcgc 

tcgttggacc 

cggttgtggc 

atcgccgccg 

ttgcgcagcc 

caccccgacc 

gtcaacggtc 

gccgagtgcg 

tccccgcagg 

gtgtcggccg 

atggacgccg 

aggcagctcg 

acagtcggtg 

acaggcaccc 

gcgtacaccc 

gacctgccgg 

gtccccgaca 

cggtcctccc 

acggacgtgg 

cagtcgcgcg 

gtggtctctc 



ctggccgagc 

ctgcccgaac 

cgcaaccggc 

ccgacaccgc 

gacgtcgggg 

gcgcagggac 

cgacgacccc 

ttctcgatgc 

cgccccgcgg 

cacccgacgg 

aggaccgcct 

ggctcgacga 

ggttccccgg 

acgcgatcgc 

gtggtctcct 

aggcgctcgc 

agcgtgcggg 

tcggtgcggt 

tcggcatcgg 

agggtccagc 

cgatggagtc 

tgagcagccc 

gctgcaaacc 

tggtgctcca 

gtggctcggc 

cccagcggcg 

actacgtgga 

tgctcgacac 

agtccaacat 

tggcgctgcg 

tcgactggga 

agcgcccgcg 

tcgtcgagga 

ccggagcgac 

cactggtgtt 

accgtctcac 

gccgtgccac 

gcctccgggc 

ccggccgccg 

gggacctgct 

tcgccccgca 

ccgtcgacgt 

agtcgtacgg 

cgcacgtggc 

gggtgctgcg 

aggccgccga 

cccgttcggt 

aggccgaggg 

tggagtcgct 

ggatccccct 

actactggtt 

ccgaggcggg 

tcgaggccac 

tgcgccgcga 

ggggggtgga 

tctacccgtt 

ccggcgacga 

tggccggacg 

tccgcgacgg 

cccggatcgg 

tgctcgcgct 



tggtccgctc 

gcaaggcgtt 

tcggcgtcac 

tggcggtggc 

tcggtgcgcg 

acgccgacgt 

cggagaccga 

tcgacaggcg 

cagtggaccg 

ccggggtatc 

ccggcgctat 

ggtcgagtac 

gggtgtggac 

cgaggcgccc 

cgcggagccg 

gacggacccc 

tttcgacccg 

ggactacgga 

caccgcctcc 

cgtcaccgtc 

gctgcgccgc 

gggtgcgttc 

gttctcccgc 

acggctgtcc 

gatcaaccag 

ggtgatcagg 

ggcccacggc 

gtacggtgcc 

cggtcacacc 

gcatcgggag 

ccggggtgcg 

ccgggcgggg 

ggcgccgagc 

ccccggaacg 

ctccgcgcgc 

cgacgacccg 

ctgggagcat 

cgtcgccggg 

ggtggtgctg 

gcggcagtcg 

cgtggactgg 

ggtgcagccg 

ggtgactccg 

tggtgcgttg 

ccgtctcggt 

gcggatcgcg 

ggtgctggcc 

cgtgaccgcc 

gcgtgaggag 

gtactcgacc 

cgccaacctc 

gttcgacgcc 

cctcgaggca 

acgcggcggt 

ggtcgactgg 

ccaacgacag 

gtggcgttac 

ggtcctggtg 

cctggaacag 

cgccgcactc 

cgccgagggc 



gcacgcggcg 


33120 


caaggacctc 


33180 


caccggcgta 


33240 


cgaacacctg 


33300 


cctcgacgac 


33360 


cggggcccgc 


33420 


gccagtgacg 


33480 


tctcggcggg 


33540 


taccgccctg 


33600 


cacggaaggg 


33660 


ctcaagcgca 


33720 


cgggcccgcg 


33780 


tcgccggagg 


33840 


acggaccgtg . 


33900 


ggcgcgttcg 


33960 


cagcagcgcc 


34020 


tcgagcctgc 


34080 


cccaggccgg 


34140 


agcgtcgcct 


34200 


gacaccgcct 


34260 


gacgagtgca 


34320 


accgagttcc 


34380 


gccgccgacg 


34440 


gtcgcccggg 


34500 


gacggtgcca 


34560 


caggcgttgg 


34620 


accggcaccc 


34680 


gaccgggaac 


34740 


caggcggcgg 


34800 


atcccggcga 


34860 


gtgtcggtgg 


34920 


gtgtcctcgt 


34980 


ccgcaggcgg 


35040 


gatgccgccc 


35100 


gacgagcggg 


35160 


gccccctcgt 


35220 


cgggcggtcg 


35280 


ggacgtcccg 


35340 


gtcttccccg 


35400 


ccgaccttcg 


35460 


tcgctgcgcg 


35520 


gtgctgttcg 


35580 


ggtgcggtgg 


35640 


tcgttggccg 


35700 


ggtcacggcg 


35760 


cgcttcgcgg 


35820 


ggggagaacg 


35880 


cgtcggatcc 


35940 


ctgctcgccg 


36000 


ctgaccggtc 


36060 


cgggagccgg 


36120 


ttcgtcgagg 


36180 


gtgctgcccc 


36240 


ctcgcgcagt 


36300 


cgtaccgcag 


36360 


aacttctggc 


36420 


cagctcgcct 


36480 


gtgaccggag 


36540 


cgcggggcga 


36600 


gacgccgtcg 


3€660' 


ggtgctgtcg 


36720 
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acgaccccag 
tccccctgtg 
cggcccaggc 
ggggtggcct 
tactggccga 
cccgtctcgt 
tcctggtcac 
cgggcgccga 
acctgcgtga 
ccgaccgcga 
cggcggtgtt 
gcgagttcac 
gtcccgagct 
ggctggcctc 
gcagtgggct 
gtaccgaggg 
cgatcgagga 
tggaccggga 
aactcggtgg 
ggctggcgtc 
aggtggcagc 
gtgacctggg 
ccggggtccg 
cgcactacct 
tcccgcaggc 
tcgccggtgg 
cggtcaccga 
ccgagcggca 
tcgacgcggc 
ggcaggtcct 
tgcgcggtac 
cgcaggtgcc 
ccggtcggat 
gttcgtcgtc 
ggctcgcggt 
ccaggcaggg 
ggttcggatt 
gggaggggcg 
gtaatgggtt 
gtcgtgcggg 
ggttggggga 
gggtgggtcc 
cgggtgtggt 
tggtgtgtcg 
cggatggggt 
cgtttggggt 
tgggggcgga 
tggtgccggt 
tgcacgacgc 
gacgcgccca 
acaggctgcg 
cgggtggtgg 
gggggttgtt 
cgtcggtggt 
tggatcgggt 
tgtggcggtg 
cggcggcggt 
gggcgcgggc 
ccgaacgcgc 
actccccgac 
actgcgccga 



cctggacacc 
gctggtgacc 
catggtcggt 
ggtggacctg 
cccgcgcggc 
cccggcaccg 
cggcggcacc 
gcacctggtg 
cgaactggtc 
ccggttggcg 
ccacgccgcc 
cgagatcacc 
ggacgccctc 
ctacgcggcg 
gccggtcacc 
cggcgactac 
gc^gcggacc 
gcggctcg-c 
ggtccgcgcc 
gatgccggag 
ggtgctgggc 
att cgact cc 
ggtggccacg 
ggaacgactc 
acccggggag 
agtgcgtacc 
gatgccgtcg 
cggcaccagc 
gttcttcggg 
ggagacgacg 
ggacaccggt 
gaaggagagt 
cgcgtacgt g 

gcttgtggcg 
ggcgggtggg 
cgcgctggcc 
cgccgagggc 
tcgggtgttg 
ggcggcgccg 
tgtgtcgggt 
tccggtggag 
ggtggtggtg 
gggtgtgatc 
gggtgggttg 
gcgggggtgg 
gtcggggacg 
acggccggtg 
ggtgctgtcg 
cgtcgacgac 
cctgccctac 
ggcgttcacc 
tgtggtgttt 
gtcggttccg 
ggggttttcg 
ggatgtggtg 
gtgtggggtt 
ggtggcgggg 
gttgcgggcg 
ccgggagctg 
ctcggtggtg 
gaccggtgag 



ctcgcgttgg 
agggacgccg 
gggctcggcc 
cgcgaggccg 
gaggagcagt 
gcccgcgcgg 
ggcggcatcg 
ctgctcaaca 
gcgctcggca 
gccgtcctcg 
gggatctccc 
gacgcgaagg 
gtgctgttct 
ggcaacgcct 
tcgatcgcct 
ctgcgcagcc 
accctggacg 
gaactgttca 
ggggccgagg 
gccgaacgtc 
cacggcacgc 
a:qaccgccg 
accatcgtct 
gtcggtgagc 
gccgacgagc 
cccgaccagt 
gaccggtcct 
tac:cccggc 
a tcLcgccgc 
tgggagctgt 
gtcitcctcg 
gagggc tacc 
t r ggggttgg 
t rgcacgtgg 
qtqtcggtqa 
cccgacggt c 
gtcgccgtgg 
ggtgtgg-gg 
tcgggggtgg 

ggggatgtgg 
tcgggggcgt 
ggcicggtga 
aaggtggtgt 
tcggggttgg 
ccggtgggtg 
aa tgct catg 
gaggggtcgt 
gcaaagaccg 
aecgtcgccc 
cgggccgccc 
actggt tcgg 
gtttttcctg 
gtgtttgtgg 
gtgttggggg 
cagccggtgt 
gtgcctgcgg 
gtgttgtcgg 
t tggccggcc 
atcgcaccct 
gtctcgggtg 
cgggccaaga 



tccaggcgct 
ccgccgtgac 
gggtggtggg 
acgccgactc 
tcgcgatccg 
cgggtacccg 
gcgcgcacct 
ggcggggagc 
cgggagtcac 
acgccgcacg 
ggtccacagc 
tgcggggtac 
cctcgaacgc 
tcctcgacgc 
ggggtctgtg 
agggcctgcg 
ccggggaccc 
ccgccgcccg 
agaccggtca 
acgagcatgt 
cgacggtgat 
tcgacctgcg 
tcgaccaccc 
cggaggcgac 
cgatcgcgat 
tgtgggactt 
gggacctcga 
acggcgcgtt 
gtgaggcgtt 
tcgagaacgc 
gcgctgcgta 
tgctcaccgg 
aggggccggc 
cggccgggtc 
tggccggtcc 
ggtgcaagcc 
tgctcctgca 
tgggttcggc 
cgcagcagcg 
gtgtggtgga 
tgttggggac 
aggcgaatgt 
tggggttggg 
tggattggtc 
tggatggggt 
tggtggtggc 
cgcgggggtt 
aaaccgccct 
tcccggcggt 
tgctggcccg 
cggctcccgg 
gtcagggtgg 
agtcggtggt 
tgttggaggg 
tgttcgtggt 
cggtggtggg 
tgggtgatgg 
acggcggcat 
ggtccgaccg 
acccacaggc 
cgctgcctgt 



cggcgcagcc 
cgtcggagac 
cgtggagtcc 
ggcccggtcg 
gcccgacggc 
gtggacgccg 
ggcccgctgg 
ggaggcggcc 
catcacggcc 
ggcgcaggga 
ggtacaggag 
ggcgaacctg 
ggcggtgtgg 
cttcgcccgt 
ggccgggcag 
cgccatggac 
gtgggtgtcg 
ccgccggccc 
ggaatcggat 
cgcccggctg 
cgagcgtgac 
gaaccggctc 
gacagtggac 
gaccccggct 
cgtcgggatg 
catcgtcgcc 
cgcgctgttc 
cctggacggg 
ggcgatggat 
cggcatcgac 
ccaggggtac 
tggttcctcg 
gatcactgtg 
gctgcgatcg 
ggaggtgttc 
cttctccgac 
gcggttgtcg 
ggtgaatcag 
ggtgattcgg 
ggcgcatggg 
gtatggggtg 
gggtcatgtg 
tcgggggttg 
gtcgggtggg 
gcgtcggggt 
ggaggcgccg 
ggtgggggtg 
gaccgagctc 
ggccgccacc 
cgaccacgac 
tgtggtgtcg 
tcagtgggtg 
ggagtgtgat 
tcggtcgggt 
gatggtgtcg 
tcattcgcag 
tgcgcgggtg 
ggtctccctc 
gatctcggtg 
cctcgccgcc 
ggactacgcc 



gggatcgacg 
gacgtcgatc 
cccgcccggt 
ctggccgcca 
gtcaccgtcg 
cgcgggaccg 
ctcgccggtg 
ggtgccgccg 
tgcgacgtcg 
cgggtggtca 
ctgaccgaga 
gccgaactct 
ggcagcccgg 
cgtggtcggc 
aacatggccg 
ccgcagcggg 
gtggtggacc 
ctct tcgacg 
ctcgcccggc 
gtccgagccg 
gtcgccttcc 
gcggcggtga 
cgcctcaccg 
gcggcggtcg 
gcctgccgcc 
gacggcgacg 
gacccggacc 
gcggccgact 
ccgcagcagc 
ccgcactccc 
ggccagaacg 
gcggtcgcct 
gacacggcgt 
ggtgactgtg 
accgagttct 
caggccgacg 
gtggcggtgc 
gatggggcga 
cgggcgtggg 
acggggacgc 
ggtcggggtg 
caggcggcgg 
gtgggtccga 
ttggtggtgg 
ggggtgtcgg 
gggtcggtgg 
gctggtggtg 
gcccgacgac 
ctcgccaccg 
gaactgcgcg 
ggggtggcgt 
gggatggcgc 
gcggtggtgt 
gcgccgtcgt 
ttggcgcggt 
ggggagatcg 
gtggcgttgc 
gcggtctccg 
gcggcggtca 
ctcgtcgccc 
tcccactccg 



36780 
36840 
36900 
36960 
37020 
37080 
37140 
37200 
37260 
37320 
37380 
37440 
37500 
37560 
37620 
37680 
37740 
37800 
37860 
37920 
37980 
38040 
38100 
38160 
38220 
38280 
38340 
38400 
38460 
38520 
38580 
38640 
38700 
38760 
38820 
38880 
38940 
39000 
39060 
39120 
39180 
39240 
39300 
39360 
39420 
39480 
39540 
39600 
39660 
39720 
39780 
39840 
39900 
39960 
40020 
40080 
40140 
40200 
40260 
4Q320 
40380 
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cccacgtcga acagatccgc gacacgatcc tcaccgacct ggccgacgtc acggcgcgcc 404 40 

gacccgacgt cgccctctac tccacgctgc acggcgcccg gggcgccggc acggacatgg 40500 

acgcccggta ctggtacgac aacctgcgct caccggtgcg cttcgacgag gccgtcgagg 40560 

ccgccgtcgc cgacggctac cgggtcttcg tcgagatgag cccacacccg gtcctcaccg 40620 

ccgcggtgca ggagatcgac gacgagacgg tggccatcgg ctcgctgcac cgggacaccg 4 0680 

gcgagcggca cctggtcgcc gaactcgccc gggcccacgt gcacggcgta ccagtggact 40*740 

ggcgggcgat cctccccgcc acccacccgg ttcccctgcc gaactacccg ttcgaggcga 40800 

cccggtactg gctcgccccg acggcggccg accaggtcgc cgaccaccgc taccgcgtcg 4 0860 

actggcggcc cctggccacc accccggcgg agctgtccgg cagctacctc gtcttcggcg 40920 

acgccccgga gaccctcggc cacagcgtcg agaaggccgg cgggctcctc gtcccggtgg 4 0980 

ccgctcccga ccgggagtcc ctcgcggtcg ccctggacga ggcggccgga cgactcgccg 41040 

gtgtgctctc cttcgccgcc gacaccgcca cccacctggc ccggcaccga ctcctcggcg 411O0 

aggccgacgt cgaggcccca ctctggctgg tcaccagcgg cggcgtcgca ctcgacgacc 41160 

acgacccgat cgactgcgac caggcaatgg tgtgggggat cggacgggtg atgggtctgg 41220 

agaccccgca ccggtggggc ggcctggtgg acgtgaccgt cgaacccacc gccgaggacg 41280 

gggtggtctt cgccgccctc ctggccgccg acgaccacga ggaccaggtg gcgctgcgcg 41340 

acggcatccg ccacggccga cggctcgtcc gcgccccgct gaccacccga aacgccaggt 41400 

ggacaccggc gggcacggcg ctcgtcacgg gcggtacggg tgccctcggc ggccacgtcg 414 60 

cgcggtacct ggcccggtcc ggggtgaccg atctcgtcct gctcagcagg agcggccccg 41520 

acgcacccgg tgccgccgaa ctggccgccg aactggccga cctcggggcc gagccgagag 41580 

tcgaggcgtg cgacgtcacc gacgggccac gcctgcgcgc cctggtgcag gagctacggg 4164 0 

aacaggaccg gccggtccgg atcgtcgtcc acaccgcagg ggtgcccgac tcccgtcccc 41*700 

tcgaccggat cgacgaactg gagtcggtca gcgccgcgaa ggtgaccggg gcgcggctgc 41*7 60 

tcgacgagct ctgcccggac gccgacacct tcgtcctgtt ctcctcgggg gcgggagtgt 41820 

ggggtagcgc gaacctgggc gcgtacgcgg cagccaacgc ctacctggac gccctggccc 41880 

accgccgccg ccaggcgggc cgggccgcga cctcggtcgc ctggggggcg tgggccggcg 41940 

acggcatggc caccggcgac ctcgacgggc tgacccggcg cggtctgcgg gcgatggcac 42000 

cggaccgggc gctgcgcgcc tgcaccaggc gttggaccac ccacgacacc tgtgtgtcgg 42060 

tagccgacgt cgactgggac cgcttcgccg tgggtttcac cgccgcccgg cccagacccc 42120 

tgatcgacga actcgtcacc tccgcgccgg tggccgcccc caccgctgcg gcggccccgg 42180 

tcccggcgat gaccgccgac cagctactcc agttcacgcg ctcgcacgtg gccgcgatcc 42240 

tcggtcacca 'ggacccggac gcggtcgggt tggaccagcc cttcaccgag ctgggcttcg 42300 

actcgctcac cgccgtcggc ctgcgcaacc agctccagca ggccaccggg cggacgctgc 42360 

ccgccgccct ggtgttccag caccccacgg tacgcagact cgccgaccac ctcgcgcagc 42420 

agctcgacgt cggcaccgcc ccggtcgagg cgacgggcag cgtcctgcgg gacggctacc 42480 

ggcgggccgg gcagaccggc gacgtccggt cgtacctgga cctgctggcg aacctgtcgg 4254 0 

agttccggga gcggttcacc gacgcggcga gcctgggcgg acagctggaa ctcgtcgacc 42600 

tggccgacgg atccggcccg gtcactgtga tctgttgcgc gggcactgcg gcgctctccg 42660 

ggccgcacga gttcgcccga ctcgcctcgg cgctgcgcgg caccgtgccg gtgcgcgccc 42720 

tcgcgcaacc cgggtacgag gcgggtgaac cggtgccggc gtcgatggag gcagtgctcg 42780 

gggtgcaggc ggacgcggtc ctcgcggcac agggcgacac gccgttcgtg ctggtcggac 4284 0 

actcggcggg ggccctgatg gcgtacgccc tggcgaccga gctggccgac cggggccacc 42900 

cgccacgtgg cgtcgtgctc ctcgacgtgt acccacccgg tcaccaggag gcggtgcacg 42960 

cctggctcgg cgagctgacc gccgccctgt tcgaccacga gaccgtacgg atggacgaca 43020 

cccggctcac ggccctgggg gcgtacgaca ggctgaccgg caggtggcgt ccgagggaca 4 3080 

ccggtctgcc cacgctggtg gtggccgcca gcgagccgat gggggagtgg ccggacgacg 43140 

gttggcagtc cacgtggccg ttcgggcacg acagggtcac ggtgcccggt gaccacttct 43200 

cgatggtgca ggagcacgcc gacgcgatcg cgcggcacat cgacgcctgg ttgagcgggg 43260 

agagggcatg aacacgaccg atcgcgccgt gctgggccga cgactccaga tgatccgggg 43320 

actgtactgg ggttacggca gcaacggaga cccgtacccg atgctgttgt gcgggcacga 43380 

cgacgacccg caccgctggt accgggggct gggcggatcc ggggtccggc gcagccgtac 434 40 

cgagacgtgg gtggtgaccg accacgccac cgccgtgcgg gtgctcgacg acccgacctt 43500 

cacccgggcc accggccgga cgccggagtg gatgcgggcc gcgggcgccc cggcctcgac 4 3560 

ctgggcgcag ccgttccgtg acgtgcacgc cgcgtcctgg gacgccgaac tgcccgaccc 4 3620 

gcaggaggtg gaggaccggc tgacgggtct cctgcctgcc ccggggaccc gcctggacct 4 3680 

ggtccgcgac ctcgcctggc cgatggcgtc gcggggggtc ggcgcggacg accccgacgt 43740 

gctgcgcgcc gcgtgggacg cccgggtcgg cctcgacgcc cagctcaccc cgcagcccct 4 3800 

ggcggtgacc gaggcggcga tcgccgcggt gcccggggac ccgcaccggc gggcgctgtt 4 3860 

caccgccgtc gagatgacag ccaccgcgtt cgtcgacgcg gtgctggcgg tgaccgccac 43920 

ggcgggggcg gcccagcgtc tcgccgacga ccccgacgtc gccgcccgtc tcgtcgcgga 43980 

ggtgctgcgc ctgcatccga cggcgcacct ggaacggcgt accgccggca ccgagacggt 4 4040 
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ggtgggcgag 

ccgtgacgcg 

ccgggccctg 

gaccaccgcc 

ggtcgtcagg 

ctgaggtgcc 

gtctcgttcc 

caccggctct 

acgtcgacct 

gcctggactt 

agaccgtcct 

tgatctcctt 

cgtcgatcgc 

tcacggtacg 

gggaggaccc 

cgcaggacgt 

tgcgcctcga 

cggtggtgcc 

gcatctccag 

tcggtgacgt 

cccacgtccc 

cctgcgcggc 

gcgtgccgca 

aggaccaggg 

aggcggtgcg 

ccgacatgct 

gggaacggac 

cctgctgacc 

cgcagtacgc 

cgacgactct, 

cggggacacc 

cgggcaggtc 

cctgcggcgt 

ggagccgtgg 

ttacgtcggg 

ccgccgtcac 

tcccgaactg 

gacccgcctc 

gggacagccg 

cggcgagcac 

ggcggtcgtc 

cgccctcggc 

aggggcggcc 

gtcggtgcgg 

cagcaggt cc 

cccgcccggg 

gtcctccacc 

cggcggcccg 

cagcgacatg 

catctgctcg 

ggcggtctgc 

acgggcggtc 

ctcgccgacg 

gagagcccgc 

ctccggggcc 

gaagcagtac 

catcggtgtc 

accgccgcca 

acagggcccc 

ggatgccgtg 

ggtcggggaa 



cacacggtcg 
ggggtcttcg 
tccgcccagc 
gcactgcgca 
cgacgtcgtt 
tgcgatgcgc 
cctcgcctgg 
caccgacgac 
tgtcgacttc 
cagcgagcgg 
caccccgacc 
ctgtcggtcg 
ggcgacggtg 
ggcccggcag 
cctcgccgag 
cgaggagctg 
caccgggctg 
ggactggctg 
ccgggagaac 
cgacgccgag 
ggccaacatc 
gacggtgcac 
ggtgatcctg 
ggcgggcatc 
gcgggtcctg 
cgccgagccg 
cgccgtcgga 
agccggctct 
ctgctcgacc 
gcgtcgacca 
ggtcggcggg 
ggcgggggcg 
ctcggtgtcg 
gacgaggtct 
tcgtcgggct 
cgcctcggcc 
gaggtcctgc 
ggcggtctgc 
acggcactgc 
cccgccgagg 
ggtgcgcgga 
gcgacggaac 
ccggaggcgt 
cgggacggcc 
ggcgccaccg 
aaggtccggg 
gtcgtcaacg 
gcctgccagg 
tccaacagcc 
acgagtccgt 
ccggagacga 
tcgtaggcga 
aggtcgcgca 
tcgtcacgtc 
agtgcccggg 
agccgggccc 
tcatccgttc 
gacaacctcg 
cagggcgacc 
cccgaaggtg 
ctgtcccggg 



cggcgggcga 
ccgacccgga 
gcggtcaccc 
gcgtcgccaa 
caccggtcct 
gtcgtcttct 
gccttccgcg 
atcacggcgg 
atgacccacg 
gacccggcca 
ttctacgccc 
tggcgacccg 
accggcgtgg 
aagttcctcg 
tggctcacct 
gtggtcgggc 
aggacggtgg 
cacgacgagc 
agcatcgggc 
atcatcgcga 
cgtacggtcg 
cacggcggtc 
cccgacggct 
gccctgccgg 
gacgatcccg 
tcccccgccg 
tgagcaccga 
ggctgggtac 
acgcccgttc 
gtgcccaggt 
aggagaccgt 
gcctctccgc 
accacgtcga 
ggcaggcggt 
tccccggatg 
tggtgtccca 
ccgccgcgca 
t cggcggcga 
gctcggcggt 
tcgcactggc 
cgcccggacg 
tcaccgccct 
ggctacggtg 
gccgcggtcc 
actcggccac 
tacggccggg 
ggtcggtgtc 
cgtaggagcg 
cctggtcggc 
cctcgtcggg 
acaaccgcag 
ccagggcgcc 
gcacggccgc 
ggtcctgccg 
cgaggtcgag 
ggtgtccgtc 
ggtcgcaccg 
tcggagggga 
tctccctcca 
aggtgtcccc 
tcgcggttgg 



cgaggtcgtc 
ccgcctcgac 
cggccggttg 
ggcgctgccc 
gcgagccacc 
cctccatggc 
cggcgggcca 
ccggactgac 
ccgggtacga 
cctccacctg 
tgatgagccc 
actggtcgtc 
cccacgcccg 
ggctgctgcc 
ggtctgtgga 
agtggacgat 
gcatgcgcta 
cgacccgccg 
aggtctccgt 
cagtggacga 
ggttcgtccc 
ccggcagctg 
gggacaccgg 
tgcccgagct 
ccttcaccgc 
aggtcgtcga 
cqccacccac 
ggcagccctc 
ccggggcgtc 
cgccgaggag 
cctgtcggtg 
ccggcagatc 
cgtccttcac 
ggacgccctc 
gcacatcgtc 
ccagtgtcgg 
ggcgtacggg 
cggtccgggc 
ggaggcgtac 
gtgggtgctg 
gctcgactcc 
ggacgggatc 
agagcccgcc 
ccgccccggt 
ctccccgacg 
gactaccgag 
accgcagagg 
cagcacccgg 
caatgcggcc 
caggtcggtg 
cggtcgcacc 
catgctgtga 
gacctcgtcg 
gcccgggtac 
gtacgagtcg 
ggcggacccg 
gcaggtggtc 
agcccagcga 
gcttggccag 
ggctgtccct 
ccgccccgtt 



gtggtggtcg 
ccggaccggg 
gaggagctgg 
ggtctcaccg 
gcccactgcc 
cagcaagagc 
cgaggtacgg 
ggccgtaccg 
catcatcgac 
ggaccacctg 
ggactcgctg 
tggaccgcag 
actcctgtgg 
cggacagccc 
gaggttcggc 
cgaccccgcc 
cgtcgactac 
acgggtctgc 
cgacgacctg 
gcagcagctc 
gatgcacgca 
gcacaccgcc 
ggtccgcgcc 
gacctccgac 
cggtgcggcg 
cgtctgtgcg 
gtccggctcg 
gccggccagg 
aactgcctcg 
tcggtcggcc 
acggtgggtg 
atcgcctcct 
ctgccccggg 
gtggccgccg 
gccgcccagg 
tacgacctga 
ctcggggtct 
gccgcagccg 
gaggtgtt ct 
tcccggcccg 
gcgctccgcg 
ttccccgggg 
cctgacctgc 
cagccggtgg 
tggtcggcga 
tacggcagcc 
gtggtgatgc 
tggtcggccc 
tcgctgaccc 
cgccgctcgt 
cccggacgag 
ccgaacaggg 
gcgatctccc 
tgcaccgccc 
gcggcggctc 
aaccgccgca 
gatgccgcgc 
cagcttcggg 
cgggcggccc 
ggtgacgtcg 
ggcgatcagg 



ccgccgccaa 
ccgacgccga 
tggtggtcct 
ccggtggccc 
cggtcgaact 
cacctgttcg 
gtcgtcgcct 
gtcggcaccg 
tacgtccgca 
ctcggcatgc 
gtcgagggca 
accttcgccg 
ggacccgaca 
gccgcccacc 
ggccgggtgc 
ccggtcggga 
aacggcccgt 
ctcaccctgg 
ttgggtgcgc 
gaaggcgtcg 
ctgctgccga 
gccatccacg 
cagcggaccg 
cagctccgcg 
cggatgcggg 
gggctggtcg 
gccggtgcgc 
acgacgccga 
acaccgccga 
ggtggttggc 
tcccaccggg 
gtgagggctc 
tggaccgggt 
gaaaggtctg 
agcacgccgt 
cgtcgcgcca 
tcgccaggcc 
cacgggcgtc 
gcagagacct 
gtgtggcggg 
cctgcggcgt 
tcgccgcagc 
gggaacccgt 
gggtgagccg 
ggtagaagtg 
agcgttgggc 
cggcccgcag 
gcagcaccgg 
cgagcctgcg 
ggacccgggg 
cctccaggcg 
cgaacggaac 
cggcggtgcc 
acacgtcgac 
ccgcgtgcgg 
accaggtgtt 
agcaggagcg 
aagcggtcga 
atgcagtagt 
aaccggtcgg 
acggtgctgt 



44100 
44160 
44220 
44280 
44340 
44400 
44460 
44520 
44580 
44640 
44700 
44760 
44820 
44880 
44940 
45000 
45060 
45120 
45180 
45240 
45300 
45360 
45420 
45480 
45540 
45600 
45660 
45720 
45780 
45840 
45900 
45960 
46020 
46080 
46140 
46200 
46260 
46320 
4 6380 
46440 
46500 
46560 
46620 
46680 
46740 
46800 
46860 
46920 
46980 
47040 
47100 
47160 
47220 
47280 
47340 
47400 
47460 
47520 
47580 
47£40 
47700 
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acgccgggat cgtcaccccg ccgatctcca cctcggcggt ggcgaaccgg gtggtggtct 4 77 60 

ccggtggggc ctggtagcgc aggatctcct ccaccgctcc gggcagcagt gccgggtcct 47820 

tccggaccag cgcgagctgg tcggggtggg tcagcagcag gtaggtgccg atcccgatga 4 7 880 

ggctcaccga cgcctcgaat cccgccagca gcagcaccag cgcgatggag gtgagttcgt 47940 

cgcggctgag ccggtcggcg tcgtcgtcct ggacccggat c 47981 

<210> 2 
<211> 48 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 2 

Met Gly Asp Arg Val Asn Gly His Ala Thr Pro Glu Ser Thr Gin Ser 

1 " 5 10 15 

Ala lie Arg Phe Leu Thr Arg His Gly Gly Pro Pro Thr Ala Thr Asp 

20 25 30 

Asp Val His Asp Trp Leu Ala His Arg Ala Ala Glu His Arg Leu Glu 
35 40 45 



<210> 3 
<211> 377 
<212> PRT 

<213> Micromonospora megalomicea 



<400> 3 






























Met 


Ala 


Val 


Gly 


Asp 


Arg 


Arg 


Arg 


Leu 


Gly 


Arg 


Glu 


Leu 


Gin 


Met 


Ala 


1 








5 










10 










15 




Arg 


Gly 


Leu 


Tyr 


Trp 


Gly 


Phe 


Gly 


Ala 


Asn 


Gly 


Asp 


Leu 


Tyr 


Ser 


Met 






20 










25 










30 






Leu 


Leu 


Ser 
35 


Gly 


Arg 


Asp 


Asp 


Asp 
40 


Pro 


Trp 


Thr 


Trp 


Tyr 
45 


Glu 


Arg 


Leu 


Arg 


Ala 
50 


Ala- 


Gly 


Arg 


Gly 


Pro 
55 


Tyr 


Ala 


Ser 


Arg 


Ala 
60 


Gly 


Thr 


Trp 


Val 


Val 


Gly 


Asp 


His 


Arg 


Thr 


Ala 


Ala 


Glu 


Val 


Leu 


Ala 


Asp 


Pro 


Gly 


Phe 


65 










70 










75 










80 


Thr 


His 


Gly 


Pro 


Pro 
85 


Asp 


Ala 


Ala 


Arg 


Trp 
90 


Met 


Gin 


Val 


Ala 


His 
95 


Cys 


Pro 


Ala 


Ala 


Ser 
100 


Trp 


Ala 


Gly 


Pro 


Phe 
105 


Arg 


Glu 


Phe 


Tyr 


Ala 
110 


Arg 


Thr 


Glu 


Asp 


Ala 
115 


Ala 


Ser 


Val 


Thr 


Val 
120 


Asp 


Ala 


Asp 


Trp 


Leu 
125 


Gin 


Gin 


Arg 


Cys 


Ala 
130 


Arg 


Leu 


Val 


Thr 


Glu 
135 


Leu 


Gly 


Ser 


Arg 


Phe 
140 


Asp 


Leu 


Val 


Asn 


Asp 


Phe 


Ala 


Arg 


Glu 


Val 


Pro 


Val 


Leu 


Ala 


Leu 


Gly 


Thr 


Ala 


Pro 


Ala 


145 










150 










155 










160 


Leu 


Lys 


Gly 


Val 


Asp 
165 


Pro 


Asp 


Arg 


Leu 


Arg 
170 


Ser 


Trp 


Thr 


Ser 


Ala 
175 


Thr 


Arg 


Val 


Cys 


Leu 


Asp 


Ala 


Gin 


Val 


Ser 


Pro 


Gin 


Gin 


Leu 


Ala 


Val 


Thr 






180 










185 










190 






Glu 


Gin 


Ala 


Leu 


Thr 


Ala 


Leu 


Asp 


Glu 


lie 


Asp 


Ala 


Val 


Thr 


Gly Gly 






195 










200 










205 








Arg 


Asp 


Ala 


Ala 


Val 


Leu 


Val 


Gly 


Val 


Val 


Ala 


Glu 


Leu 


Ala 


Ala 


Asn 


210 










215 










220 










Thr 


Val 


Gly 


Asn 


Ala 


Val 


Leu 


Ala 


Val 


Thr 


Glu 


Leu 


Pro 


Glu 


Leu 


Ala 


225 








230 










235 










240 


Ala 


Arg 


Leu 


Ala 


Asp 
245 


Asp 


Pro 


Glu 


Thr 


Ala 
250 


Thr 


Arg 


Val 


Val 


Thr 
255 


Glu 


Val 


Ser 


Arg 


Thr 
260 


Ser 


Pro 


Gly 


Val 


His 
265 


Leu 


Glu 


Arg 


Arg 


Thr 
270 


Ala 


Ala 


Ser 


Asp- 


. Arg 
275 


Arg 


Val 


Gly 


Gly 


Val 
280 


Asp 


Val 


Pro 


Thr 


Gly 
285 


Gly 


Glu 


Val 
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Thr 


> ax 


Val 

V CI JL 


Val 


Ala 


Ala 


Ala 


Asn 


Arn 








veil 


rne 


i fir 


Asp 




9 on 




















JUU 
















nay 


Vox 




A TO 


Glv 


Glv 


Ben 
no £J 






lie 


Leu 


O V* 

£>er 


305 










310 










315 










320 


Ser 


Arg 


Pro 


Gly 


Ser 


Pro 


Arg 


Thr 


Asp 


Leu 


Asp 


Ala 


Leu 


Val 


Ala 


Thr 










325 










330 










335 




Leu 


Ala 


Thr 


Ala 


Ala 


Leu 


Arg 


Ala 


Ala 


Ala 


Pro 


Val 


Leu 


Pro 


Arg 


Leu 








340 










345 










350 




Ser 


Arg 


Ser 


Gly 


Pro 


Val 


lie 


Arg 


Arg 


Arg 


Arg 


Ser 


Pro 


Val 


Ala 


Arg 






355 










360 










365 








Gly 


Leu 


Ser 


Arg 


Cys 


Pro 


Val 


Glu 


Leu 

















370 375 



<210> 4 
<211> 436 
<212> PRT 

<213> Microraonospora megalomicea 



<400> 4 



Met 


Arg 


v ax 


vai 


Irne 


£>er 


oer 


Met 


Ala 


val 


7S „ „ 

Asn 


Ser 


His 


T ^ . _ 

Leu 


Phe 


Gly 


1 








c 

3 










1 u 










lo 




Leu 


Val 


T~> 

tr ro 


L»eu 


r\.x a 


c ^ >» 


cl 


irne 


ijin 




Ala 


Lily 


HlS 


Glu 


Val 


Arg 








90 
























Val 


Val 


nld 




D r f~\ 


Ala 


T an 

ljcr LI 


i nr 


nop 


Asp 


vai 


l nr 


(jiy 


Ala 


GXy 


T a . * 

Leu 






35 










6 0 










H D 








Thr 


Ala 


Val 


Pro 


V^ 1 


ui y 


H e;r» 




VOX 


r: 1 n 


L5U 


V a ± 


blU 


irp 


HIS 


Ala 




50 




















D U 










His 


Ala 


Gly 




21 c o 


Tip 


Vpi 1 
v a J. 


m n 


i yr 


rJet 


A r/T 

Arg 


i nr 


lieu 


ASp 


irp 


vai 


65 










70 










I D 










© o 


Asp 


Gin 


Ser 


n jl -j 


Tkr 


i. 1 1 j_ 




o c i- 


T rn 

i rp 




A 




bcU 


uiy 


Men 


bin 










8 S 










<30 










JO 




Thr 


Thr 


Phe" 


Th r 


Pro 


Thr 


Phe 


Phe 

rue 




T oil 
J_» t; Li 


1 1c l_ 


OCX 


Dr a 


rvSp 




jueu 








100 










1 OS 










1 1 n 

11U 






He 


Asp 


Gly 


fie l. 


Val 


blU 


tr ne 


v-ys 


7\ 

Arg 


C! o >- 

oer 


Trp 


Arg 


fro 


Asp 


1 rp 


~r t _ 

He 






115 










120 










125 








Val 


Trp 


Glu 


Pro 


Leu 


Thr 


Phe 


Ala 


Ala 


Pro 


He 


Ala 


Ala 


Arg 


Val 


Thr 




130 










135 










140 








Gly 


Thr 


Pro 


His 


Ala 


Arg 


Met 


Leu 


Trp 


Gly 


Pro 


Asp 


Val 


Ala 


Thr 


Arg 


145 










150 










155 










160 


Ala 


Arg 


Gin 


Ser 


Phe 


Leu 


Arg 


Leu 


Leu 


Ala 


His 


Gin 


Glu 


Val 


Glu 


His 










165 










170 










175 




Arg 


Glu 


Asp 


Pro 


Leu 


Ala 


Glu 


Trp 


Phe 


Asp 


Trp 


Thr 


Leu 


Arg 


Arg 


Phe 








180 










185 










190 






Gly Asp 


Asp 


Pro 


His 


Leu 


Ser 


Phe 


Asp 


Glu 


Glu 


Leu 


Val 


Leu 


Gly 


Gin 






195 










200 










205 








Trp 


Thr 


Val 


Asp 


Pro 


lie 


Pro 


Glu 


Pro 


Leu 


Arg 


He 


Asp 


Thr 


Gly 


Val 




210 










215 










220 










Arg 


Thr 


Val 


Gly 


Met 


Arg 


Tyr 


val 


Pro 


Tyr 


Asn 


Gly 


Pro 


Ser 


Val 


Val 


225 










230 










235 










240 


Pro 


Ala 


Trp 


Leu 


Leu 


Arg 


Glu 


Pro 


Glu 


Arg 


Arg 


Arg 


Val 


Cys 


Leu 


Thr 










245 










250 










255 




Leu 


Gly Gly 


Ser 


Ser 


Arg 


Glu 


His 


Gly 


He 


Gly 


Gin 


Val 


Ser 


He 


Gly 








260 










265 










270 






Glu 


Met 


Leu 


Asp 


Ala 


lie 


Ala 


Asp 


He 


Asp 


Ala 


Glu 


Phe 


Val 


Ala 


Thr 






275 










280 










285 








Phe 


Asp 


Asp 


Gin 


Gin 


Leu 


Val 


Gly 


Val 


Gly 


Ser 


Val 


Pro 


Ala 


Asn 


Val 




290 










295 










300 










Arg 


Thr 


Ala 


Gly 


Phe 


Val 


Pro 


Met 


Asn 


Val 


Leu 


Leu 


Pro 


Thr 


Cys 


Ala 


305 










310 










315 










320 


Ala 


Thr- 


-Val 


His 


His 


Gly 


Gly 


Thr 


Gly 


Ser 


Trp 


Leu 


Thr 


Ala 


Ala 


He 










325 










330 










335 
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His 


Gly 


Val 


Pro 
340 


Gin 


He 


He 


Leu 


Ser 
345 


Asp 


Ala 


Asp 


Thr 


Glu 
350 


Val 


His 


Ala 


Lys 


Gin 
355 


Leu 


Gin 


Asp 


Leu 


Gly 
360 


Ala 


Gly 


Leu 


Ser 


Leu 
365 


Pro 


Val 


Ala 


Gly 


Met 
370 


Thr 


Ala 


Glu 


His 


Leu 
375 


Arg 


Gly 


Ala 


He 


Glu 
380 


Arg 


Val 


Leu 


Asp 


Glu 


Pro 


Ala 


Tyr 


Arg 


Leu 


Gly Ala 


Glu 


Arg 


Met 


Arg 


Asp 


Gly 


Met 


Arg 


385 










390 










395 










400 


Thr 


Asp 


Pro 


Ser 


Pro 
405 


Ala 


Gin 


Val 


Val 


Gly 
410 


He 


Cys 


Gin 


Asp 


Leu 
415 


Ala 


Ala 


Asp 


Arg 


Ala 
420 


Ala 


Arg 


Gly 


Arg 


Gin 
425 


Pro 


Arg 


Arg 


Thr 


Ala 
430 


Glu 


Pro 


His 


Leu 


Pro 
435 


Arg 



























<210> 5 
<211> 390 
<212> PRT 

<213> Micromonospora megalomicea 
<4 00> 5 

Met Val Thr Ser Thr Asn Leu Asp. Thr Thr Ala Arg Pro Ala Leu Asn 

1 5 10 15 

Ser Leu Thr Gly Met Arg Phe Val Ala Ala Phe Leu Val Phe Phe Thr 

20 25 30 

His Val Leu Ser Arg Leu He Pro Asn Ser Tyr Val Tyr Ala Asp Gly 

35 40 45 

Leu Asp Ala Phe Trp Gin Thr Thr Gly Arg Val Gly Val Ser Phe Phe 

50 55 60 

Phe He Leu Ser Gly Phe Val Leu Thr Trp Ser Ala Arg Ala Ser Asp 
65 70 75 80 

Ser Val Trp' Ser Phe Trp Arg Arg Arg Val Cys Lys Leu Phe Pro Asn 

85 90 95 

His Leu Val Thr Ala Phe Ala Ala Val Val Leu Phe Leu Val Thr Gly 

100 105 110 

Gin Ala Val Ser Gly Glu Ala Leu He Pro Asn Leu Leu Leu lie His 

115 120 125 

Ala Trp Phe Pro Ala Leu Glu He Ser Phe Gly He Asn Pro Val Ser 

130 135 140 

Trp Ser Leu Ala Cys Glu Ala Phe Phe Tyr Leu Cys Phe Pro Leu Phe 
145 150 155 160 

Leu Phe Trp He Ser Gly He Arg Pro Glu Arg Leu Trp Ala Trp Ala 

165 170 175 

Ala Val Val Phe Ala Ala He Trp Ala Val Pro Val Val Ala Asp Leu 

180 185 190 

Leu Leu Pro Ser Ser Pro Pro Leu He Pro Gly Leu Glu Tyr Ser Ala 

195 200 205 

He Gin Asp Trp Phe Leu Tyr Thr Phe Pro Ala Thr Arg Ser Leu Glu 

210 215 220 

Phe He Leu Gly He He Leu Ala Arg He Leu He Thr Gly Arg Trp 
225 230 235 240 

He Asn Val Gly Leu Leu Pro Ala Val Leu Leu Phe Pro Val Phe Phe 

245 250 255 

Val Ala Ser Leu Phe Leu Pro Gly Val Tyr Ala He Ser Ser Ser Met 

260 265 270 

Met He Leu Pro Leu Val Leu He He Ala Ser Gly Ala Thr Ala Asp 

275 280 285 

Leu Gin Gin Lys Arg Thr Phe Met Arg Asn Arg Val Met Val Trp Leu 

290 295 300 

Gly Asp Val Ser Phe Ala Leu Tyr Met Val His Phe Leu Val He Val 
305 310 315 320 
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Tyr Gly Ala Asp Leu Leu Gly Phe Ser Gin Thr Glu Asp Ala Pro Leu 

325 330 335 

Gly Leu Ala Leu Phe Met lie lie Pro Phe Leu Ala Val Ser Leu Val 

340 345 350 

Leu Ser Trp Leu Leu Tyr Arg Phe Val Glu Leu Pro Val Met Arg Asn 

355 360 365 

Trp Ala Arg Pro Ala Ser Ala Arg Arg Lys Pro Ala Thr Glu Pro Glu 

370 375 380 

Gin Thr Pro Ser Arg Arg 
385 390 

<210> 6 
<211> 374 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 6 

Met Thr Thr Tyr Val Trp Ser Tyr Leu Leu Glu Tyr Glu Arg Glu Arg 

15 10 15 

Ala Asp lie Leu Asp Ala Val Gin Lys Val Phe Ala Ser Gly Ser Leu 

20 25 30 

lie Leu Gly Gin Ser Val Glu Asn Phe Glu Thr Glu Tyr Ala Arg Tyr 

35 40 45 

His Gly He Ala His Cys Val Gly Val Asp Asn Gly Thr Asn Ala Val 

50 55 60 

Lys Leu Ala Leu Glu Ser Val Gly Val Gly Arg Asp Asp Glu Val Val 
65 70 75 80 

Thr Val Ser Asn Thr Ala Ala Pro Thr Val Leu Ala He Asp Glu He 

85 90 95 

Gly Ala Arg Pro Val Phe Val Asp Val Arg Asp Glu Asp Tyr Leu Met 

100 105 110 

Asp Thr Asp' Leu Val Glu Ala Ala Val Thr Pro Arg Thr Lys Ala He 

115 120 125 

Val Pro Val His Leu Tyr Gly Gin Cys Val Asp Met Thr Ala Leu Arg 

130 135 140 

Glu Leu Ala Asp Arg Arg Gly Leu Lys Leu Val Glu Asp Cys Ala Gin 
145 150 155 160 

Ala His Gly Ala Arg Arg Asp Gly Arg Leu Ala Gly Thr Met Ser Asp 

165 170 175 

Ala Ala Ala Phe Ser Phe Tyr Pro Thr Lys Val Leu Gly Ala Tyr Gly 

180 185 190 

Asp Gly Gly Ala Val Val Thr Asn Asp Asp Glu Thr Ala Arg Ala Leu 

195 200 205 

Arg Arg Leu Arg Tyr Tyr Gly Met Glu Glu Val Tyr Tyr Val Thr Arg 

210 215 220 

Thr Pro Gly His Asn Ser Arg Leu Asp Glu Val Gin Ala Glu He Leu 
225 230 235 240 

Arg Arg Lys Leu Thr Arg Leu Asp Ala Tyr Val Ala Gly Arg Arg Ala 

245 250 255 

Val Ala Gin Arg Tyr Val Asp Gly Leu Ala Asp Leu Gin Asp Ser His 

260 265 270 

Gly Leu Glu Leu Pro Val Val Thr Asp Gly Asn Glu His Val Phe Tyr 

275 280 285 

Val Tyr Val Val Arg His Pro Arg Arg Asp Glu He He Lys Arg Leu 

290 295 300 

Arg Asp Gly Tyr Asp He Ser Leu Asn He Ser Tyr Pro Trp Pro Val 
305 310 315 320 

His Thr Met Thr Gly Phe Ala His Leu Gly Val Ala Ser Gly Ser Leu 

325 330 335 

Pro Val- Thr Glu Arg Leu Ala Gly Glu He Phe Ser Leu Pro Met Tyr 

340 345 350 
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Pro Ser Leu Pro His Asp Leu Gin Asp Arg Val lie Glu Ala Val Arg 

355 360 365 

Glu Val lie Thr Gly Leu 
370 

<210> 7 
<211> 257 
<212> PRT 

<213> Micromonospora megalomicea 



<400> 7 



Met 


Pro 


Asn 


Ser 


His 


Ser 


Thr 


Thr 


Ser 


Ser 


Thr 


Asp 


Val 


Ala 


Pro 


Tyr 


1 








5 










10 










15 




Glu 


Arg 


Ala 


Asp 


He 


Tyr 


His 


Asp 


Phe 


Tyr 


His 


Gly 


Arg 


Gly 


Lys 


Gly 








20 










25 










30 






Tyr 


Arg 


Ala 


Glu 


Ala 


Asp 


Ala 


Leu 


Val 


Glu 


Val 


Ala 


Arg 


Lys 


His 


Thr 






35 










40 










45 








Pro 


Gin 


Ala 


Ala 


Thr 


Leu 


Leu 


Asp 


Val 


Ala 


Cys 


Gly 


Thr 


Gly 


Ser 


His 




50 










55 










60 










Leu 


Val 


Glu 


Leu 


Ala 


Asp 


Ser 


Phe 


Arg 


Glu 


Val 


Val 


Gly 


Val 


Asp 


Leu 


65 










70 










75 










80 


Ser 


Ala 


Ala 


Met 


Leu 


Ala 


Thr 


Ala 


Ala 


Arg 


Asn 


Asp 


Pro 


Gly Arg 


Glu 










85 










90 










95 




Leu 


His 


Gin 


Gly 


Asp 


Met 


Arg 


Asp 


Phe 


Ser 


Leu 


Asp 


Arg 


Arg 


Phe 


Asp 








100 










105 










110 






Val 


Val 


Thr 


Cys 


Met 


Phe 


Ser 


Ser 


Thr 


Gly 


Tyr 


Leu 


Val 


Asp 


Glu 


Ala 






115 










120 










125 








Glu 


Leu 


Asp 


Arg 


Ala 


Val 


Ala 


Asn 


Leu 


Ala 


Gly 


His 


Leu 


Ala 


Pro 


Gly 




130 










135 










140 










Gly 


Thr 


Leu 


Val 


Val 


Glu 


Pro 


Trp 


Trp 


Phe 


Pro 


Glu 


Thr 


Phe 


Arg 


Pro 


145 










150 










155 










160 


Gly 


Trp 


Val 


Gly 


Ala 


Asp 


Leu 


Val 


Thr 


Ser 


Gly 


Asp 


Arg 


Arg 


He 


Ser 










165 










170 










175 




Arg 


Met 


Ser 


His 


Thr 


Val 


Pro 


Ala 


Gly 


Leu 


Pro 


Asp 


Arg 


Thr 


Ala 


Ser 








180 










185 










190 






Arg 


Met 


Thr 


He 


His 


Tyr 


Thr 


Val 


Gly 


Ser 


Pro 


Glu 


Ala 


Gly 


He 


Glu 






195 










200 










205 








His 


Phe 


Thr 


Glu 


Val 


His 


Val 


Met 


Thr 


Leu 


Phe 


Ala 


Arg 


Ala 


Ala 


Tyr 




210 










215 










220 










Glu 


Gin 


Ala 


Phe 


Gin 


Arg 


Ala 


Gly 


Leu 


Ser 


Cys 


Ser 


Tyr 


Val 


Gly 


His 


225 










230 










235 










240 


Asp 


Leu 


Phe 


Ser 


Pro 


Gly 


Leu 


Phe 


Val 


Gly 


Val 


Ala 


Ala 


Glu 


Pro 


Gly 










245 










250 










255 





Arg 



<210> 8 
<211> 201 
<212> PRT 

<213> Micromonospora megalomicea 



<400> 8 



Met 


Arg 


Val 


Glu 


Glu 


Leu 


Gly 


He 


Glu 


Gly 


Val 


Phe 


Thr 


Phe 


Thr 


Pro 


1 








5 










10 










15 




Gin 


Thr 


Phe 


Ala 


Asp 


Glu 


Arg 


Gly 


Val 


Phe 


Gly 


Thr 


Ala 


Tyr 


Gin 


Glu 








20 










25 










30 






Asp 


Val 


Phe 


Val 


Ala 


Ala 


Leu 


Gly 


Arg 


Pro 


Leu 


Phe 


Pro 


Val 


Ala 


Gin 






35 










40 










45 








Val 


Ser 


Thr 


Thr 


Arg 


Ser 


Arg 


Arg 


Gly 


Val 


Val 


Arg 


Gly 


Val 


His 


Phe 




50 - 










55 










60 










Thr 


Thr 


Met 


Pro 


Gly 


Ser 


Met 


Ala 


Lys 


Tyr 


Val 


Tyr 


Cys 


Ala 


Arg 


Gly 



22 



WO 01/27284 



PCT/US00/27433 



65 










70 










75 










80 


Aro 


Ala 


Met 


Asd 


Phe 


Ala 


Val 


Asp 


He 


Aro 


Pro 


Glv 


Ser 


Pro 


Thr 


Phe 










85 










90 










95 




Glv 


Arc 


Ala 


Glu 


Pro 


Val 


Glu 


Leu 


Ser 


Ala 


Glu 

U 


Ser 


Met 


Val 


Glv 


Leu 

l_l W I* 






100 










105 










110 






fur 
l y i. 


LrGU 


Pro 


Val 


Glv 


Met 


Glv 


His 


Leu 


Phe 


Val 


Ser 


Leu 


Glu 


Asd 


Asd 






115 

X X *S 










120 










125 
x ^ «j 








Thr 


Thr 


Leu 


Val 


Tvr 


T.pi \ 


Met 


Ser 


Ala 


Glv 


i yi 


V G JL 


Prn 


A cio 








130 










135 
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Glu 


Leu 


Ala 


Ala 


Arg 


Val 


Glu 


Arg 


Trp 


Asp 


Asp 


Asp 


Val 










245 










250 










255 




Val 


Pro 


Ala 


Gly 


Val 


Asn 


Gly 


Pro 


Arg 


Ser 


Val 


Leu 


Leu 


Thr 


Gly 


Ala 








260 










265 










270 






Pro 


Glu 


Pro 


lie 


Ala 


Arg 


Arg 


Val 


Ala 


Glu 


Leu 


Ala 


Ala 


Gin 


Gly 


Val 






275 










280 










285 








Arg 


Ala 


Gin 


Val 


Val 


Asn 


Val 


Ser 


Met 


Ala 


Ala 


His 


Ser 


Ala 


Gin 


Val 




290 










295 










300 










Asp 


Ala 


Val 


Ala 


Glu 


Gly 


Met 


Arg 


Ser 


Ala 


Leu 


Thr 


Trp 


Phe 


Ala 


Pro 


305 










310 










315 










320 


Gly 


Asp 


Ser 


Asp 


Val 


Pro 


Tyr 


Tyr 


Ala 


Gly 


Leu 


Thr 


Gly 


Gly 


Arg 


Leu 










325 










330 










335 




Asp 


Thr 


Arg 


Glu 


Leu 


Gly 


Ala 


Asp 


His 


Trp 


Pro 


Arg 


Ser 


Phe 


Arg 


Leu 








340 










345 










350 






Pro 


Val 


Arg 


Phe 


Asp 


Glu 


Ala 


Thr 


Arg 


Ala 


Val 


Leu 


Glu 


Leu 


Gin 


Pro 






355 










360 










365 








Gly 


Thr 


Phe 


lie 


Glu 


Ser 


Ser 


Pro 


His 


Pro 


Val 


Leu 


Ala 


Ala 


Ser 


Leu 




370 










375 










380 










Gin 


Gin 


Thr 


Leu 


Asp 


Glu 


Val 


Gly 


Ser 


Pro 


Ala 


Ala 


He 


Val 


Pro 


Thr 


385 










390 










395 










400 


Leu 


Gin 


Arg 


Asp 


Gin 


Gly 


Gly 


Leu 


Arg 


Arg 


Phe 


Leu 


Leu 


Ala 


Val 


Ala 










405 










410 










415 




Gin 


Ala 


Tyr 


Thr 


Gly 


Gly 


Val 


Thr 


Val 


Asp 


Trp 


Thr 


Ala 


Ala 


Tyr 


Pro 








420 










425 










430 






Gly 


Val 


Thr 


Pro 


Gly 


His 


Leu 


Pro 


Ser 


Ala 


Val 


Ala 


Val 


Glu 


Thr 


Asp 






435 










440 










445 








Glu 


Gly 


Pro 


Ser 


Thr 


Glu 


Phe 


Asp 


Trp 


Ala 


Ala 


Pro 


Asp 


His 


Val 


Leu 




450 










455 










4 60 










Arg 


Ala 


Arg 


Leu 


Leu 


Glu 


He 


Val 


Gly 


Ala 


Glu 


Thr 


Ala 


Ala 


Leu 


Ala 


465 










470 










475 










480 


Gly 


Arg 


Glu 


Val 


Asp 


Ala 


Arg 


Ala 


Thr 


Phe 


Arg 


Glu 


Leu 


Gly 


Leu 


Asp 










485 










4 90 










495 




Ser 


Val 


Leu 


Ala 


Val 


Gin 


Leu 


Arg 


Thr 


Arg 


Leu 


Ala 


Thr 


Ala 


Thr 


Gly 








500 










505 










510 






Arg 


Asp 


Leu 


His 


He 


Ala 


Met 


Leu 


Tyr 


Asp 


His 


Pro 


Thr 


Pro 


His 


Ala 






515 










520 










525 








Leu 


Thr 


Glu 


Ala 


Leu 


Leu 


Arg 


Gly 


Pro 


Gin 


Glu 


Glu 


Pro 


Gly 


Arg 


Gly 




530 










535 










540 










Glu 


Glu 


Thr 


Ala 


His 


Pro 


Thr 


Glu 


Ala 


Glu 


Pro 


Asp 


Glu 


Pro 


Val 


Ala 


545 










550 










555 










560 


Val 


Val 


Ala 


Met 


Ala 


Cys 


Arg 


Leu 


Pro 


Gly 


Gly 


Val 


Thr 


Ser 


Pro 


Glu 










565 










570 










575 




Glu 


Phe 


Trp 


Glu 


Leu 


Leu 


Ala 


Glu 


Gly 


Arg 


Asp 


Ala 


Val 


Gly 


Gly 


Leu 








580 










585 










590 






Pro 


Thr 


Asp 


Arg 


Gly 


Trp 


Asp 


Leu 


Asp 


Ser 


Leu 


Phe 


His 


Pro 


Asp 


Pro 
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595 600 605 

Thr Arg Ser Gly Thr Ala His Gin Arg Ala Gly Gly Phe Leu Thr Gly 

610 615 620 

Ala Thr Ser Phe Asp Ala Ala Phe Phe Gly Leu Ser Pro Arg Glu Ala 
625 630 635 640 

Leu Ala Val Glu Pro Gin Gin Arg lie Thr Leu Glu Leu Ser Trp Glu 

645 650 655 

Val Leu Glu Arg Ala Gly lie Pro Pro Thr Ser Leu Arg Thr Ser Arg 

660 665 670 

Thr Gly Val Phe Val Gly Leu lie Pro Gin Glu Tyr Gly Pro Arg Leu 

675 680 685 

Ala Glu Gly Gly Glu Gly Val Glu Gly Tyr Leu Met Thr Gly Thr Thr 

690 695 700 

Thr Ser Val Ala Ser Gly Arg Val Ala Tyr Thr Leu Gly Leu Glu Gly 
705 710 715 720 

Pro Ala lie Ser Val Asp Thr Ala Cys Ser Ser Ser Leu Val Ala Val 

725 730 735 

His Leu Ala Cys Gin Ser Leu Arg Arg Gly Glu Ser Thr Met Ala Leu 

740 745 750 

Ala Gly Gly Val Thr Val Met Pro Thr Pro Gly Met Leu Val Asp Phe 

755 760 765 

Ser Arg Met Asn Ser Leu Ala Pro Asp Gly Arg Ser Lys Ala Phe Ser 

770 775 780 

Ala Ala Ala Asp Gly Phe Gly Met Ala Glu Gly Ala Gly Met Leu Leu 
785 790 795 800 

Leu Glu Arg Leu Ser Asp Ala Arg Arg His Gly His Pro Val Leu Ala 

805 810 815 

Val lie Arg Gly Thr Ala Val Asn Ser Asp Gly Ala Ser Asn Gly Leu 

820 825 830 

Ser Ala Pro Asn Gly Arg Ala Gin Val Arg Val lie Arg Gin Ala Leu 

835 840 845 

Ala Glu Ser "Gly Leu Thr Pro His Thr Val Asp Val Val Glu Thr His 

850 855 860 

Gly Thr Gly Thr Arg Leu Gly Asp Pro lie Glu Ala Arg Ala Leu Ser 
865 870 875 880 

Asp Ala Tyr Gly Gly Asp Arg Glu His Pro Leu Arg lie Gly Ser Val 

885 890 895 

Lys Ser Asn lie Gly His Thr Gin Ala Ala Ala Gly Val Ala Gly Leu 

900 905 910 

He Lys Leu Val Leu Ala Met Gin Ala Gly Val Leu Pro Arg Thr Leu 

915 920 925 

His Ala Asp Glu Pro Ser Pro Glu He Asp Trp Ser Ser Gly Ala He 

930 935 940 

Ser Leu Leu Gin Glu Pro Ala Ala Trp Pro Ala Gly Glu Arg Pro Arg 
945 950 955 960 

Arg Ala Gly Val Ser Ser Phe Gly lie Ser Gly Thr Asn Ala His Ala 

965 970 975 

He He Glu Glu Ala Pro Pro Thr Gly Asp Asp Thr Arg Pro Asp Arg 

980 985 990 

Met Gly Pro Val Val Pro Trp Val Leu Ser Ala Ser Thr Gly Glu Ala 

995 1000 1005 

Leu Arg Ala Arg Ala Ala Arg Leu Ala Gly His Leu Arg Glu His Pro 

1010 1015 1020 

Asp Gin Asp Leu Asp Asp Val Ala Tyr Ser Leu Ala Thr Gly Arg Ala 
1025 1030 1035 1040 

Ala Leu Ala Tyr Arg Ser Gly Phe Val Pro Ala Asp Ala Ser Thr Ala 

1045 1050 1055 

Leu Arg He Leu Asp Glu Leu Ala Ala Gly Gly Ser Gly Asp Ala Val 

1060 1065 1070 

Thr Gly Thr Ala Arg Ala Pro Gin Arg Val Val Phe Val Phe Pro Gly 
1075 1080 1085 
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Gin Gly Trp Gin Trp Ala Gly Met Ala Val Asp Leu Leu Asp Gly Asp 

1090 1095 1100 

Pro Val Phe Ala Ser Val Leu Arg Glu Cys Ala Asp Ala Leu Glu Pro 
1105 1110 1115 1120 

Tyr Leu Asp Phe Glu lie Val Pro Phe Leu Arg Ala Glu Ala Gin Arg . 

1125 1130 1135 

Arg Thr Pro Asp His Thr Leu Ser Thr Asp Arg Val Asp Val Val Gin 

1140 1145 1150 

Pro Val Leu Phe Ala Val Met Val Ser Leu Ala Ala Arg Trp Arg- Ala 

1155 1160 1165 • 

Tyr Gly Val Glu Pro Ala Ala Val He Gly His Ser Gin Gly Glu He 

1170 1175 1180 

Ala Ala Ala Cys Val Ala Gly Ala Leu Ser Leu Asp Asp Ala Ala Arg 
1185 1190 1195 1200 

Ala Val Ala Leu Arg Ser Arg Val He Ala Thr Met Pro Gly Asn Gly 

1205 1210 1215 

Ala Met Ala Ser He Ala Ala Ser Val Asp Glu Val Ala Ala Arg He 

1220 1225 1230 

Asp Gly Arg Val Glu He Ala Ala Val Asn Gly Pro Arg Ala Val Val 

1235 1240 1245 

Val Ser Gly Asp Arg Asp Asp Leu Asp Arg Leu Val Ala Ser Cys Thr 

1250 1255 1260 

Val Glu Gly Val Arg Ala Lys Arg Leu Pro Val Asp Tyr Ala Ser His 
1265 1270 1275 1280 

Ser Ser His Val Glu Ala Val Arg Asp Ala Leu His Ala Glu Leu Gly 

1285 1290 1295 

Glu Phe Arg Pro Leu Pro Gly Phe Val Pro Phe Tyr Ser Thr Val Thr 

1300 1305 1310 

Gly Arg Trp Val Glu Pro Ala Glu Leu Asp Ala Gly Tyr Trp Phe Arg 

1315 1320 1325 

Asn Leu Arg His Arg Val Arg Phe Ala Asp Ala Val Arg Ser Leu Ala 

1330 ' 1335 1340 

Asp Gin Gly Tyr Thr Thr Phe Leu Glu Val Ser Ala His Pro Val Leu 
1345 1350 1355 1360 

Thr Thr Ala He Glu Glu He Gly Glu Asp Arg Gly Gly Asp Leu Val 

1365 1370 1375 

Ala Val His Ser Leu Arg Arg. Gly Ala Gly Gly Pro Val Asp Phe Gly 

1380 1385 1390 

Ser Ala Leu Ala Arg Ala Phe Val Ala Gly Val Ala Val Asp Trp Glu 

1395 1400 1405 

Ser Ala Tyr Gin Gly Ala Gly Ala Arg Arg Val Pro Leu Pro Thr Tyr 

1410 1415 1420 

Pro Phe Gin Arg Glu Arg Phe Trp Leu Glu Pro Asn Pro Ala Arg Arg 
1425 1430 1435 1440 

Val Ala Asp Ser Asp Asp Val Ser Ser Leu Arg Tyr Arg He Glu Trp 

1445 1450 1455 

His Pro Thr Asp Pro Gly Glu Pro Gly Arg Leu Asp Gly Thr Trp Leu 

1460 1465 1470 

Leu Ala Thr Tyr Pro Gly Arg Ala Asp Asp Arg Val Glu Ala Ala Arg 

1475 1480 1485 

Gin Ala Leu Glu Ser Ala Gly Ala Arg Val Glu Asp Leu Val Val Glu 

1490 1495 1500 

Pro Arg Thr Gly Arg Val Asp Leu Val Arg Arg Leu Asp Ala Val Gly 
1505 1510 1515 1520 

Pro Val Ala Gly Val Leu Cys Leu Phe Ala Val Ala Glu Pro Ala Ala 

1525 1530 1535 

Glu His Ser Pro Leu Ala Val Thr Ser Leu Ser Asp Thr Leu Asp Leu 

1540 1545 1550 

Thr Gin Ala Val Ala Gly Ser Gly Arg Glu Cys Pro He Trp Val Val 

• 1555 1560 1565 

Thr Glu Asn Ala Val Ala Val Gly Pro Phe Glu Arg Leu Arg Asp Pro 
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1570 1575 1580 

Ala His Gly Ala Leu Trp Ala Leu Gly Arg Val Val Ala Leu Glu Asn 
1585 1590 1595 1600 

Pro Ala Val Trp Gly Gly Leu Val Asp Val Pro Ser Gly Ser Val Ala 

1605 . 1610 1615 

Glu Leu Ser Arg His Leu Gly Thr Thr Leu Ser Gly Ala Gly Glu Asp 

1620 1625 1630 

Gin Val Ala Leu Arg Pro Asp Gly Thr Tyr Ala Arg Arg Trp Cys Arg 

1635 1640 1645 " 

Ala Gly Ala Gly Gly Thr Gly Arg Trp Gin Pro Arg Gly Thr Val Leu 

1650 1655 1660 

Val Thr Gly Gly Thr Gly Gly Val Gly Arg His Val Ala Arg Trp Leu 
1665 1670 1675 1680 

Ala Arg Gin Gly Thr Pro Cys Leu Val Leu Ala Ser Arg Arg Gly Pro 

1685 1690 1695 

Asp Ala Asp Gly Val Glu Glu Leu Leu Thr Glu Leu Ala Asp Leu Gly 

1700 1705 1710 

Thr Arg Ala Thr Vai Thr Ala Cys Asp Val Thr Asp Arg Glu Gin Leu 

1715 1720 1725 

Arg Ala Leu Leu Ala Thr Val Asp Asp Glu His Pro Leu Ser Ala Val 

1730 1735 1740 

Phe His Val Ala Ala Thr Leu Asp Asp Gly Thr Val Glu Thr Leu Thr 
1745 1750 1755 1760 

Gly Asp Arg lie Glu Arg Ala Asn Arg Ala Lys Val Leu Gly Ala Arg 

1765 1770 1775 

Asn Leu His Glu Leu Thr Arg Asp Ala Asp Leu Asp Ala Phe Val Leu 

1780 1785 1790 

Phe Ser Ser Ser Thr Ala Ala Phe Gly Ala Pro Gly Leu Gly Gly Tyr 

1795 1800 1805 

Val Pro Gly Asn Ala Tyr Leu Asp Gly Leu Ala Gin Gin Arg Arg Ser 

1810 1815 1820 

Glu Gly Leu* Pro Ala Thr Ser Val Ala Trp Gly Thr Trp Ala Gly Ser 
1825 1830 1835 1840 

Gly Met Ala Glu Gly Pro Val Ala Asp Arg Phe Arg Arg His Gly Val 

1845 1850 1855' 

Met Glu Met His Pro Asp Gin Ala Val Glu Gly Leu Arg Val Ala Leu 

1860 1865 1870 

Val Gin Gly Glu Val Ala Pro lie Val Val Asp lie Arg Trp Asp Arg 

1875 1880 1885 

Phe Leu Leu Ala Tyr Thr Ala Gin Arg Pro Thr Arg Leu Phe Asp Thr 

1890 1895 1900 

Leu Asp Glu Ala Arg Arg Ala Ala Pro Gly Pro Asp Ala Gly Pro Gly 
1905 1910 1915 1920 

Val Ala Ala Leu Ala Gly Leu Pro Val Gly Glu Arg Glu Lys Ala Val 

1925 1930 1935 

Leu Asp Leu Val Arg Thr His Ala Ala Ala Val Leu Gly His Ala Ser 

1940 1945 1950 

Ala Glu Gin Val Pro Val Asp Arg Ala Phe Ala Glu Leu Gly Val Asp, 

1955 I960 1965 

Ser Leu Ser Ala Leu Glu Leu Arg Asn Arg Leu Thr Thr Ala Thr Gly 

1970 1975 1980. 

Val Arg Leu Ala Thr Thr Thr Val Phe Asp His Pro Asp Val Arg Thr 
1985 1990 1995 2000 

Leu Ala Gly His Leu Ala Ala Glu Leu Gly Gly Gly Ser Gly Arg Glu 

2005 2010 2015 

Arg Pro Gly Gly Glu Ala Pro Thr Val Ala Pro Thr Asp Glu Pro lie 

2020 2025 2030 

Ala lie Val Gly Met Ala Cys Arg Leu Pro Gly Gly Val Asp Ser Pro 

2035 2040 2045 

Glu Gin- Leu Trp Glu Leu lie Val Ser Gly Arg Asp Thr Ala Ser Ala 
2050 2055 2060 
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Ala Pro Gly Asp Arg Ser Trp Asp Pro Ala Glu Leu Met Val Ser Asp 
2065 2070 2075 2080 

Thr Thr Gly Thr Arg Thr Ala Phe Gly Asn Phe Met Pro Gly Ala Gly 

2085 2090 2095 

Glu Phe Asp Ala Ala Phe Phe Gly lie Ser Pro Arg Glu Ala Leu Ala 

2100 2105 . 2110 

Met Asp Pro Gin Gin Arg His Ala Leu Glu Thr Thr Trp Glu Ala Leu 

2115 2120 2125 

Glu Asn Ala Gly He Arg Pro Glu Ser Leu Arg Gly Thr Asp Thr Gly 

2130 2135 2140 

Val Phe Val Gly Met Ser His Gin Gly Tyr Ala Thr Gly Arg Pro Lys 
2145 2150 2155 2160 

Pro Glu Asp Glu Val Asp Gly Tyr Leu Leu Thr Gly Asn Thr Ala Ser 

2165 2170 2175 

Val Ala Ser Gly Arg He Ala Tyr Val Leu Gly Leu Glu Gly Pro Ala 

2180 2185 2190 

He Thr Val Asp Thr Ala Cys Ser Ser Ser Leu Val Ala Leu His Val 

2195 2200 2205 

Ala Ala Gly Ser Leu Arg Ser Gly Asp Cys Gly Leu Ala Val Ala Gly 

2210 2215 2220 

Gly Val Ser Val Met Ala Gly Pro Glu Val Phe Arg Glu Phe Ser Arg 
2225 2230 2235 2240 

Gin Gly Ala Leu Ala Pro Asp Gly Arg Cys Lys Pro Phe Ser Asp Glu 

2245 2250 2255 

Ala Asp Gly Phe Gly Leu Gly Glu Gly Ser Ala Phe Val Val Leu Gin 

2260 2265 2270 

Arg Leu Ser Val Ala Val Arg Glu Gly Arg Arg Val Leu Gly Val Val 

2275 2280 2285 

Val Gly Ser Ala Val Asn Gin Asp Gly Ala Ser Asn Gly Leu Ala Ala 

2290 2295 2300 

Pro Ser Gly Val Ala Gin Gin Arg Val He Arg Arg Ala Trp Gly Arg 
2305 2310 2315 2320 

Ala Gly Val Ser Gly Gly Asp Val Gly Val Val Glu Ala His Gly Thr 

2325 2330 2335 

Gly Thr Arg Leu Gly Asp Pro Val Glu Leu Gly Ala Leu Leu Gly Thr 

2340 2345 2350 

Tyr Gly Val Gly Arg Gly Gly Val Gly Pro Val Val Val Gly Ser Val 

2355 2360 2365 

Lys Ala Asn Val Gly His Val Gin Ala Ala Ala Gly Val Val Gly Val 

2370 2375 2380 

He Lys Val Val Leu Gly Leu Gly Arg Gly Leu Val Gly Pro Met Val 
2385 2390 2395 2400 

Cys Arg Gly Gly Leu Ser Gly Leu Val Asp Trp Ser Ser Gly Gly Leu 

2405 2410 2415 

Val Val Ala Asp Gly Val Arg Gly Trp Pro Val Gly Val Asp Gly Val 

2420 2425 2430 

Arg Arg Gly Gly Val Ser Ala Phe Gly Val Ser Gly Thr Asn Ala His 

2435 2440 2445 

Val Val Val Ala Glu Ala Pro Gly Ser Val Val Gly Ala Glu Arg Pro 

2450 2455 2460 

Val Glu Gly Ser Ser Arg Gly Leu Val Gly Val Val Gly Gly Val Val 
2465 2470 2475 2480 

Pro Val Val Leu Ser Ala Lys Thr Glu Thr Ala Leu His Ala Gin Ala 

2485 2490 2495 

Arg Arg Leu Ala Asp His Leu Glu Thr His Pro Asp Val Pro Met Thr 

2500 2505 2510 

Asp Val Val Trp Thr Leu Thr Gin Ala Arg Gin Arg Phe Asp Arg Arg 

2515 2520 2525 

Ala Val Leu Leu Ala Ala Asp Arg Thr Gin Ala Val Glu Arg Leu Arg 

2530 2535 2540 

Gly Leu Ala Gly Gly Glu Pro Gly Thr Gly Val Val Ser Gly Val Ala 
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2545 2550 2555 2560 

Ser Gly Gly Gly Val Val Phe Val Phe Pro Gly Gin Gly Gly Gin Trp 

2565 2570 2575 

Val Gly Met Ala Arg Gly Leu Leu Ser Val Pro Val Phe Val Glu Ser 

2580 2585 2590 

Val Val Glu Cys Asp Ala Val Val Ser Ser Val Val Gly Phe Ser Val 

2595 2600 2605 

Leu Gly Val Leu Glu Gly Arg Ser Gly Ala Pro Ser Leu Asp Arg Val 

2610 2615 2620 

Asp Val Val Gin Pro Val Leu Phe Val Val Met Val Ser Leu Ala Arg 
2625 2630 2635 2640 

Leu Trp Arg Trp Cys Gly Val Val Pro Ala Ala Val Val Gly His Ser 

2645 2650 2655 

Gin Gly Glu lie Ala Ala Ala Val Val Ala Gly Val Leu Ser Val Gly 

2660 2665 2670 

Asp Gly Ala Arg Val Val Ala Leu Arg Ala Arg Ala Leu Arg Ala Leu 

2675 2680 2685 

Ala Gly His Gly Gly Met Ala Ser Val Arg Arg Gly Arg Asp Asp Val 

2690 2695 2700 

Gin Lys Leu Leu Asp Ser Gly Pro Trp Thr Gly Lys Leu Glu lie Ala 
2705 2710 2715 2720 

Ala Val Asn Gly Pro Asp Ala Val Val Val Ser Gly Asp Pro Arg Ala 

2725 2730 2735 

Val Thr Glu Leu Val Glu His Cys Asp Gly lie Gly Val Arg Ala Arg 

2740 2745 2750 

Thr lie Pro Val Asp Tyr Ala Ser His Ser Ala Gin Val Glu Ser Leu 

2755 2760 2765 

Arg Glu Glu Leu Leu Ser Val Leu Ala Gly lie Glu Gly Arg Pro Ala 

2770 2775 2780 

Thr Val Pro Phe Tyr Ser Thr Leu Thr Gly Gly Phe Val Asp Gly Thr 
2785 2790 2795 2800 

Glu Leu Asp" Ala Asp Tyr Trp Tyr Arg Asn Leu Arg His Pro Val Arg 

2805 2810 2815 

Phe His Ala Ala Val Glu Ala Leu Ala Ala Arg Asp Leu Thr Thr Phe 

2820 2825 2830 

Val Glu Val Ser Pro His Pro Val Leu Ser Met Ala Val Gly Glu Thr 

2835 2840 2845 

Leu Ala Asp Val Glu Ser Ala Val Thr Val Gly Thr Leu Glu Arg Asp 

2850 2855 2860 

Thr Asp Asp Val Glu Arg Phe Leu Thr Ser Leu Ala Glu Ala His Val 
2865 2870 2875 2880 

His Gly Val Pro Val Asp Trp Ala Ala Val Leu Gly Ser Gly Thr Leu 

2885 2890 2895 

Val Asp Leu Pro Thr Tyr Pro Phe Gin Gly Arg Arg Phe Trp Leu His 

2900 2905 2910 

Pro Asp Arg Gly Pro Arg Asp Asp Val Ala Asp Trp Phe His Arg Val 

2915 2920 2925 

Asp Trp Thr Ala Thr Ala Thr Asp Gly Ser Ala Arg Leu Asp Gly Arg 

2930 2935 2940 

Trp Leu Val Val Val Pro Glu Gly Tyr Thr Asp Asp Gly Trp Val Val 
2945 2950 2955 2960 

Glu Val Arg Ala Ala Leu Ala Ala Gly Gly Ala Glu Pro Val Val Thr 

2965 2970 2975 

Thr Val Glu Glu Val Thr Asp Arg Val Gly Asp Ser Asp Ala Val Val 

2980 2985 2990 

Ser Met Leu Gly Leu Ala Asp Asp Gly Ala Ala Glu Thr Leu Ala Leu 

2995 3000 3005 

Leu Arg Arg Leu Asp Ala Gin Ala Ser Thr Thr Pro Leu Trp Val Val 

3010 3015 3020 

Thr Val - Gly Ala Val Ala Pro Ala Gly Pro Val Gin Arg Pro Glu Gin 
3025 3030 3035 3040 
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Ala Thr Val Trp Gly Leu Ala Leu Val Ala Ser Leu Glu Arg Gly His 

3045 3050 3055 

Arg Trp Thr Gly Leu Leu Asp Leu Pro Gin Thr Pro Asp Pro Gin Leu 

3060 3065 3070 

Arg Pro Arg Leu Val Glu Ala Leu Ala Gly Ala Glu Asp Gin Val Ala 

3075 3080 3085 

Val Arg Ala Asp Ala Val His Ala Arg Arg lie Val Pro Thr Pro Val 

3090 3095 3100 

Thr Gly Ala Gly Pro Tyr Thr Ala Pro Gly Gly Thr He Leu Val Thr 
3105 3110 3115 3120 

Gly Gly Thr Ala Gly Leu Gly Ala Val Thr Ala Arg Trp Leu Ala Glu 

3125 3130 3135 

Arg Gly Ala Glu His Leu Ala Leu Val Ser Arg Arg Gly Pro Gly Thr 

3140 3145 3150 

Ala Gly Val Asp Glu Val Val Arg Asp Leu Thr Gly Leu Gly Val Arg 

3155 3160 3165 

Val Ser Val His Ser Cys Asp Val Gly Asp Arg Glu Ser Val Gly Ala 

3170 3175 3180 

Leu Val Gin Glu Leu Thr Ala Ala Gly Asp Val Val Arg Gly Val Val 
3185 3190 3195 3200 

His Ala Ala Gly Leu Pro Gin Gin Val Pro Leu Thr Asp Met Asp Pro 

3205 3210 3215 

Ala Asp Leu Ala Asp Val Val Ala Val Lys Val Asp Gly Ala Val His 

3220 3225 3230 

Leu Ala Asp Leu Cys Pro Glu Ala Glu Leu Phe Leu Leu Phe Ser Ser 

3235 3240 3245 

Gly Ala Gly Val Trp Gly Ser Ala Arg Gin Gly Ala Tyr Ala Ala Gly 

3250 3255 3260 

Asn Ala Phe Leu Asp Ala Phe Ala Arg His Arg Arg Asp Arg Gly Leu 
3265 3270 3275 3280 

Pro Ala Thr Ser Val Ala Trp Gly Leu Trp Ala Ala Gly Gly Met Thr 

3285 3290 3295 

Gly Asp Gin Glu Ala Val Ser Phe Leu Arg Glu Arg Gly Val Arg Pro 

3300 3305 3310 

Met Ser Val Pro Arg Ala Leu Glu Ala Leu Glu Arg Val Leu Thr Ala 

3315 3320 3325 

Gly Glu Thr Ala Val Val Val Ala Asp Val Asd Trp Ala Ala Phe Ala 

3330 3335 * 3340 

Glu Ser Tyr Thr Ser Ala Arg Pro Arg Pro Leu Leu His Arg Leu Val 
3345 3350 3355 3360 

Thr Pro Ala Ala Ala Val Gly Glu Arg Asp Glu Pro Arg Glu Gin Thr 

3365 3370 3375 

Leu Arg Asp Arg Leu Ala Ala Leu Pro Arg Ala Glu Arg Ser Ala Glu 

3380 3385 3390 

Leu Val Arg Leu Val Arg Arg Asp Ala Ala Ala Val Leu Gly Ser Asp 

3395 3400 3405 

Ala Lys Ala Val Pro Ala Thr Thr Pro Phe Lys Asp Leu Gly Phe Asp 

3410 3415 3420 

Ser Leu Ala Ala Val Arg Phe Arg Asn Arg Leu Ala Ala His Thr Gly 
3425 3430 3435 3440 

Leu Arg Leu Pro Ala Thr Leu Val Phe Glu His Pro Asn Ala Ala Ala 

3445 3450 3455 

Val Ala Asp Leu Leu His Asp Arg Leu Gly Glu Ala Gly Glu Pro Thr 

3460 3465 3470 

Pro Val Arg Ser Val Gly Ala Gly Leu Ala Ala Leu Glu Gin Ala Leu 

3475 3480 3485 

Pro Asp Ala Ser Asp Thr Glu Arg Val Glu Leu Val Glu Arg Leu Glu 

3490 3495 3500 

Arg Met Leu Ala Gly Leu Arg Pro Glu Ala Gly Ala Gly Ala Asp Ala 
3505 3510 3515 3520 

Pro Thr Ala Gly Asp Asp Leu Gly Glu Ala Gly Val Asp Glu Leu Leu 
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Asp Ala Leu 



3525 
Glu Arg Glu 
3540 



Leu Asp 



3530 
Ala Arg 
3545 



3535 



<210> 14 
<211> 3562 
<212> PRT 

<213> Micromonospora megalomicea 



<400> 14 
Met Thr Asp Asn 
1 

Asp Leu Arg Ala 

20 

lie Ala Val Val 
35 

Pro Gin His Leu 
50 

Thr Phe Pro Thr 
65 

Asp Pro Asp His 

Asp Asp Val Ala 

100 

Glu Ala Thr Ala 
115 

Trp Glu Leu Val 
130 

Thr Pro Thr Gly 
145 

Asn Gly Thr Glu 

Ala Pro Ala Val 

180 

Gly Pro Ser lie 
195 

Leu His Leu Ala 
210 

Val Val Gly Gly 
225 

Phe Ser Arg Gin 

Gly Ala Ala Ala 

260 

Leu Leu Glu Arg 
275 

Ala Val He Arg 
290 

Leu Ala Ala Pro 
305 

Leu Arg Asn Cys 

His Gly Thr Gly 

340 

Leu Asp Thr Tyr 
355 

Gly Ser Val Lys 
370 

Thr Gly Leu Leu 
385 

Ala Thr Leu His 



Asp Lys Val Ala 
5 

Ala Arg Lys Arg 

Gly Met Ala Cys 

40 

Trp Asp Leu Leu 
55 

Gly Arg Gly Trp 
70 

Pro Gly Thr Ser 
85 

Gly Phe Asp Ala 

Met Asp Pro Gin 

120 

Glu Ser Ala Gly 
135 

Val Phe Leu Gly 
150 

Ala Gly Asp Ala 
165 

Ala Ser Gly Arg 

Ser Val Asp Thr 

200 

Val Glu Ser Leu 
215 

Ala Ala Val Met 
230 

Arg Ala Leu Ala 
245 

Asp Gly Phe Gly 

Leu Ser Glu Ala 

280 

Gly Ser Ala Leu 
295 

Asn Gly Thr Ala 
310 

Gly Leu Thr Pro 
325 

Thr Thr Leu Gly 

Gly Arg Asp Arg 

360 

Ser Asn He Gly 
375 

Lys Met Val Leu 
390 

Val Asp Glu Pro 



Glu Tyr Leu Arg 
10 

Leu Arg Glu Leu 
25 

Arg Leu Pro Gly 

Arg Gin Gly His 

60 

Asp Leu Ala Gly 
75 

Tyr Val Asp Arg 
90 

Glu Phe Phe Gly 
105 

Gin Arg Leu Leu 

He Asp Pro His 

140 

Val Ala Arg Leu 
155 

Glu Gly Tyr Ser 
170 

He Ser Tyr Ala 
185 

Ala Cys Ser Ser 

Arg Leu Gly Glu 

220 

Ala Thr Pro Gly 
235 

Ala Asp Gly Arg 
250 

Phe Ser Glu Gly 
265 

Glu Ser Asn Gly 

Asn Gin Asp Gly 

300 

Gin Arg Lys Val 
315 

Ala Asp Val Asp 
330 

Asp Pro He Glu 
345 

Asp Pro Asp His 

His Thr Gin Ala 

380 

Ala Leu Arg His 
395 

Thr Pro His Val 



Arg Ala Thr Leu 
15 

Gin Ser Asp Pro 
30 

Gly Val His Leu 
45 

Glu Thr Val Ser 

Leu Phe His Pro 

80 

Gly Gly Phe Leu 
95 

He Ser Pro Arg 
110 

Leu Glu Thr Ser 
125 

Ser Leu Arg Gly 

Gly Tyr Gly Glu 

160 

Val Thr Gly Val 
175 

Leu Gly Leu Glu 
190 

Ser Leu Val Ala 
205 

Ser Ser Leu Ala 

Val Phe Val Asp 

240 

Ser Lys Ala Phe 
255 

Val Ser Leu Val 
270 

His Glu Val Leu 
285 

Ala Ser Asn Gly . 

He Arg Gin Ala 

320 

Ala Val Glu Ala 
335 

Ala Asn Ala Leu 
350 

Pro Leu Trp Leu 
365 

Ala Ala Gly Val 

Glu Glu Leu Pro 

400 

Asp Trp Ser Ser 
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405 










410 










415 




,Gly 


Ala 


Val 


Arg 


Leu 


Ala 


Thr 


Arg 


Gly 


Arg 


Pro 


Trp 


Arg 


Arg 


Gly 


Asp 






420 










425 










430 






Arg 


Pro 


Arg 


Arg 


Ala 


Gly 


Val 


Ser 


Ala 


Phe 


Gly 


He 


Ser 


Gly 


Thr 


Asn 






435 










440 










445 








Ala 


His 


Val 


lie 


Val 


Glu 


Glu 


Ala 


Pro 


Glu 


Arg 


Thr 


Thr 


Glu 


Arg 


Thr 




450 










455 










4 60 










Val 


Gly 


Gly 


Asp 


Val 


Gly 


Pro 


Val 


Pro 


Leu 


Val 


Val 


Ser 


Ala 


Arg 


Ser 


465 










470 










475 










480 


Ala 


Ala 


Ala 


Leu 


Arg 


Ala 


Gin 


Ala 


Ala 


Gin 


Val 


Ala 


Glu 


Leu 


Val 


Glu 










485 










4 90 










495 




Gly 


Ser 


Asp 


Val 


Gly 


Leu 


Ala 


Glu 


Val 


Gly 


Arg 


Ser 


Leu 


Ala 


Val 


Thr 






500 










505 










510 






Arg 


Ala 


Arg 


His 


Glu 


His 


Arg 


Ala 


Ala 


Val 


Val 


Ala 


Ser 


Thr 


Arg 


Ala 






515 










520 










525 








Glu 


Ala 


Val 


Arg 


Gly 


Leu 


Arg 


Glu 


Val 


Ala 


Ala 


Val 


Glu 


Pro 


Arg 


Gly 




530 










535 










540 










Glu 


Asp 


Thr 


Val 


Thr 


Gly 


Val 


Ala 


Glu 


Thr 


Ser 


Gly 


Arg 


Thr 


Val 


Val 


545 










550 










555 










560 


Phe 


Leu 


Phe 


Pro 


Gly 


Gin 


Gly 


Ser 


Gin 


Trp 


Val 


Gly 


Met 


Gly 


Ala 


Glu 










565 










570 










575 




Leu 


Leu 


Asp 


Ser 


Ala 


Pro 


Ala 


Phe 


Ala 


Asp 


Thr 


He 


Arg 


Ala 


Cys 


Asp 








580 


• 








585 










590 






Glu 


Ala 


Met 


Ala 


Pro 


Leu 


Gin 


Asp 


Trp 


Ser 


Val 


Ser 


Asp 


Val 


Leu 


Arg 






595 










600 










605 








Gin 


Glu 


Pro 


Gly 


Ala 


Pro 


Gly 


Leu 


Asp 


Arg 


Val 


Asp 


Val 


Val 


Gin 


Pro 




610 










615 










620 










Val 


Leu 


Phe 


Ala 


Val 


Met 


Val 


Ser 


Leu 


Ala 


Arg 


Leu 


Trp 


Gin 


Ser 


Tyr 


625 










630 










635 










640 


Gly 


Val 


Thr 


Pro 


Ala 


Ala 


Val 


Val 


Gly 


His 


Ser 


Gin 


Gly 


Glu 


lie 


Ala 










645 










650 










655 




Ala 


Ala 


His 


Val 


Ala 


Gly 


Ala 


Leu 


Ser 


Leu 


Ala 


Asp 


Ala 


Ala 


Arg 


Leu 








660 










665 










670 






Val 


Val 


Gly 


Arg 


Ser 


Arg 


Leu 


Leu 


Arg 


Ser 


Leu 


Ser 


Gly 


Gly 


Gly 


Gly 






675 










680 










685 








Met 


Ser 


Ala 


Val 


Ala 


Leu 


Gly 


Glu 


Ala 


Glu 


Val 


Arg 


Arg 


Arg 


Leu 


Arg 




690 










695 










700 










Ser 


Trp 


Glu 


Asp 


Arg 


lie 


Ser 


Val 


Ala 


Ala 


Val 


Asn 


Gly 


Pro 


Arg 


Ser 


705 










710 










715 










720 


Val 


Val 


Val 


Ala 


Gly 


Glu 


Pro 


Glu 


Ala 


Leu 


Arg 


Glu 


Trp 


Gly 


Arg 


Glu 










725 










730 










735 




Arg 


Glu 


Ala 


Glu 


Gly 


Val 


Arg 


Val 


Arg 


Glu 


He 


Asp 


Val 


Asp 


Tyr 


Ala 








740 










745 




• 






750 






Ser 


His 


Ser 


Pro 


Gin 


He 


Asp 


Arg 


Val 


Arg 


Asp 


Glu 


Leu 


Leu 


Thr 


Val 






755 










760 










765 








Thr 


Gly 


Glu 


lie 


Glu 


Pro 


Arg 


Ser 


Ala 


Glu 


He 


Thr 


Phe 


Tyr 


Ser 


Thr 




770 










775 










780 










Val 


Asp 


Val 


Arg 


Ala 


Val 


Asp 


Gly 


Thr 


Asp 


Leu 


Asp 


Ala 


Gly 


Tyr 


Trp 


785 










790 










795 










800 


Tyr 


Arg 


Asn 


Leu 


Arg 


Glu 


Thr 


Val 


Arg 


Phe 


Ala 


Asp 


Ala 


Met 


Thr 


Arg 










805 










810 










815 




Leu 


Ala 


Asp 


Ser 


Gly 


Tyr 


Asp 


Ala 


Phe 


Val 


Glu 


Val 


Ser 


Pro 


His 


Pro 








820 










825 










830 






Val 


Val 


Val 


Ser 


Ala 


Val 


Ala 


Glu 


Ala 


Val 


Glu 


Glu 


Ala 


Gly 


Val 


Glu 






835 










840 










845 








Asp 


Ala 


Val 


Val 


Val 


Gly 


Thr 


Leu 


Ser 


Arg 


Gly 


Asp 


Gly 


Gly 


Pro 


Gly 




850 










855 










860 










Ala 


Phe 


Leu 


Arg 


Ser 


Ala 


Ala 


Thr 


Ala 


His 


Cys 


Ala 


Gly 


Val 


Asp 


Val 


865 










870 










875 










880 


Asp 


Trp 


Thr 


Pro 


Ala 


Leu 


Pro 


Gly Ala 


Ala 


Thr 


He 


Pro 


Leu 


Pro 


Thr 










885 










890 










895 
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Tyr 


Pro 


Phe 


Gin 
900 


Arg 


Lys 


Pro 


Tyr 


Trp 
905 


Leu 


Arg 


Ser 


Ser 


Ala 
910 


Pro 


Ala 


Pro 


Ala 


Ser 
915 


His 


Asp 


Leu 


Ala 


Tyr 
920 


Arg 


Val 


Ser 


Trp 


Thr 
925 


Pro 


lie 


Thr 


Pro 


Pro 
930 


Gly 


Asp 


Gly 


Val 


Leu 
935 


Asp 


Gly 


Asp 


Trp 


Leu 
940 


Val 


Val 


His 


Pro 


Gly 


Gly 


Ser 


Thr 


Gly 


Trp 


Val 


Asp 


Gly 


Leu 


Ala 


Ala 


Ala 


lie 


Thr 


Ala 


945 








950 










955 










960 


Gly 


Gly 


Gly 


Arg 


Val 
965 


Val 


Ala 


His 


Pro 


Val 
970 


Asp 


Ser 


Val 


Thr 

• 


Ser 
975 


Arg 


Thr 


Gly 


Leu 


Ala 
980 


Glu 


Ala 


Leu 


Ala 


Arg 
985 


Arg 


Asp 


Gly 


Thr 


Phe 
990 


Arg 


Gly 


Val 


Leu 


Ser 


Trp 


Val 


Ala 


Thr 


Asp 


Glu 


Arg 


His 


Val 


Glu 


Ala 


Gly 


Ala 






995 










1000 








1005 






Val 


Ala 


Leu 


Leu 


Thr 


Leu 


Ala 


Gin 


Ala 


Leu 


Gly 


Asp 


Ala 


Gly 


He 


Asp 




1010 








1015 








1020 








Ala 


Pro 


Leu 


Trp 


Cys 


Leu 


Thr 


Gin 


Glu 


Ala 


Val 


Arg 


Thr 


Pro 


Val 


Asp 


1025 








1030 








1035 








1040 


Gly 


Asp 


Leu 


Ala 


Arg 


Pro 


Ala 


Gin 


Ala 


Ala 


Leu 


His 


Gly 


Phe 


Ala 


Gin 










1045 








1050 








1055 


Val 


Ala 


Arg 


Leu 


Glu 


Leu 


Ala 


Arg 


Arg 


Phe 


Gly 


Gly 


Val 


Leu 


Asp 


Leu 








1060 








1065 








1070 




Pro 


Ala 


Thr 


Val 


Asp 


Ala 


Ala 


Gly 


Thr 


Arg 


Leu 


Val 


Ala 


Ala 


Val 


Leu 






1075 








1080 








1085 






Ala 


Gly 


Gly 


Gly 


Glu 


Asp 


Val 


Val 


Ala 


Val 


Arg 


Gly 


Asp 


Arg 


Leu 


Tyr 




1090 








1095 








1100 








Gly 


Arg 


Arg 


Leu 


Val 


Arg 


Ala 


Thr 


Leu 


Pro 


Pro 


Pro 


Gly 


Gly 


Gly 


Phe 


1105 








1110 








1115 








1120 


Thr 


Pro 


His 


Gly 


Thr 


Val 


Leu 


Val 


Thr 


Gly 


Ala 


Ala 


Gly 


Pro 


Val 


Gly 










1125 








1130 








1135 


Gly 


Arg 


Leu 


Ala 


Arg 


Trp 


Leu 


Ala 


Glu 


Arg 


Gly 


Ala 


Thr 


Arg 


Leu 


Val 








1140 








1145 








1150 




Leu 


Pro 


Gly 


Ala 


His 


Pro 


Gly 


Glu 


Glu 


Leu 


Leu 


Thr 


Ala 


lie 


Arg 


Ala 






1155 








1160 








1165 






Ala 


Gly Ala 


Thr 


Ala 


Val 


Val 


Cys 


Glu 


Pro 


Glu 


Ala 


Glu 


Ala 


Leu 


Arg 




1170 








1175 








1180 








Thr 


Ala 


lie 


Gly 


Gly 


Glu 


Leu 


Pro 


Thr 


Ala 


Leu 


Val 


His 


Ala 


Glu 


Thr 


1185 








1190 








1195 








1200 


Leu 


Thr 


Asn 


Phe 


Ala 


Gly 


Val 


Ala 


Asp 


Ala 


Asp 


Pro 


Glu 


Asp 


Phe 


Ala 










1205 








1210 








1215 


Ala 


Thr 


Val 


Ala 


Ala 


Lys 


Thr 


Ala 


Leu 


Pro 


Thr 


Val 


Leu 


Ala 


Glu 


Val 








1220 








1225 








1230 




Leu 


Gly 


Asp 


His 


Arg 


Leu 


Glu 


Arg 


Glu 


Val 


Tyr 


Cys 


Ser 


Ser 


Val 


Ala 






1235 








1240 








1245 






Gly 


Val 


Trp 


Gly 


Gly 


Val 


Gly 


Met 


Ala 


Ala 


Tyr 


Ala 


Ala 


Gly 


Ser 


Ala 




1250 








1255 








1260 








Tyr 


Leu 


Asp 


Ala 


Leu 


Val 


Glu 


His 


Arg 


Arg 


Ala 


Arg 


Gly 


His 


Ala 


Ser 


1265 








1270 








1275 








1280 


Ala 


Ser 


Val 


Ala 


Trp 


Thr 


Pro 


Trp 


Ala 


Leu 


Pro 


Gly 


Ala 


Val 


Asp 


Asp 










1285 








1290 








1295 


Gly 


Arg 


Leu 


Arg 


Glu 


Arg 


Gly 


Leu 


Arg 


Ser 


Leu 


Asp 


Val 


Ala 


Asp 


Ala 








1300 








1305 








1310 




Leu 


Gly Thr Trp Glu Arg Leu 


Leu 


Arg 


Ala 


Gly 


Ala 


Val 


Ser 


Val 


Ala 






1315 








1320 








1325 






Val 


Ala 


Asp 


Val 


Asp 


Trp 


Ser 


Val 


Phe 


Thr 


Glu 


Gly 


Phe 


Ala 


Ala 


He 




1330 








1335 








1340 








Arg 


Pro 


Thr 


Pro 


Leu 


Phe 


Asp 


Glu 


Leu 


Leu 


Asp 


Arg 


Arg 


Gly Asp 


Pro 


134 5 








1350 








1355 








1360 


Asp 


Gly 


Ala 


Pro 


Val 


Asp 


Arg 


Pro 


Gly 


Glu 


Pro 


Ala 


Gly 


Glu 


Trp 


Gly 










1365 








1370 








1375 


Arg 


Arg 


lie 


Ala 


Ala 


Leu 


Ser 


Pro 


Gin 


Glu 


Gin 


Arg 


Glu 


Thr 


Leu 


Leu 
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1380 1385 1390 

Thr Leu Val Gly Glu Thr Val Ala Glu Val Leu Gly His Glu Thr Gly 

1395 1400 1405 

Thr Glu lie Asn Thr Arg Arg Ala Phe Ser Glu Leu Gly Leu Asp Ser 

1410 1415 1420 

Leu Gly Ser Met Ala Leu Arg Gin Arg Leu Ala Ala Arg Thr Gly Leu 
1425 1430 1435 1440 

Arg Met Pro Ala Ser Leu Val Phe Asp His Pro Thr Val Thr Ala Leu 

1445 1450 1455 

Ala Arg Tyr Leu Arg Arg Leu Val Val Gly Asp Ser Asp Pro Thr Pro 

1460 1465 1470 

Val Arg Val Phe Gly Pro Thr Asp Glu Ala Glu Pro Val Ala Val Val 

1475 1480 1485 

Gly He Gly Cys Arg Phe Pro Gly Gly He Ala Thr Pro Glu Asp Leu 

1490 1495 1500 

Trp Arg Val Val Ser Glu Gly Thr Ser He Thr Thr Gly Phe Pro Thr 
1505 1510 1515 1520 

Asp Arg Gly Trp Asp Leu Arg Arg Leu Tyr His Pro Asp Pro Asp His 

1525 1530 1535 

Pro Gly Thr Ser Tyr Val Asp Arg Gly Gly Phe Leu Asp Gly Ala Pro 

1540 1545 1550 

Asp Phe Asp Pro Gly Phe Phe Gly He Thr Pro Arg Glu Ala Leu Ala 

1555 1560 1565 

Met Asp Pro Gin Gin Arg Leu Thr Leu Glu He Ala Trp Glu Ala Val 

1570 1575 1580 

Glu Arg Ala Gly He Asp Pro Glu Thr Leu Leu Gly Ser Asp Thr Gly 
1585 1590 1595 1600 

Val Phe Val Gly Met Asn Gly Gin Ser Tyr Leu Gin Leu Leu Thr Gly 

1605 1610 1615 

Glu Gly Asp Arg Leu Asn Gly Tyr Gin Gly Leu Gly Asn Ser Ala Ser 

1620 1625 1630 

Val Leu Ser" Gly Arg Val Ala Tyr Thr Phe Gly Trp Glu Gly Pro Ala 

1635 1640 1645 

Leu Thr Val Asp Thr Ala Cys Ser Ser Ser Leu Val Ala He His Leu 

1650 1655 1660 

Ala Met Gin Ser Leu Arg Arg Gly Glu Cys Ser Leu Ala Leu Ala Gly 
1665 1670 1675 ' 1680 

Gly Val Thr Val Met Ala Asp Pro Tyr Thr Phe Val Asp Phe Ser Ala 

1685 1690 1695 

Gin Arg Gly Leu Ala Ala Asp Gly Arg Cys Lys Ala Phe Ser Ala Gin 

1700 1705 1710 

Ala Asp Gly Phe Ala Leu Ala Glu Gly Val Ala Ala Leu Val Leu Glu 

1715 1720 1725 

Pro Leu Ser Lys Ala Arg Arg Asn Gly His Gin Val Leu Ala Val Leu 

1730 1735 1740 

Arg Gly Ser Ala Val Asn Gin Asp Gly Ala Ser Asn Gly Leu Ala Ala 
1745 1750 1755 1760 

Pro Asn Gly Pro Ser Gin Glu Arg Val lie Arg Gin Ala Leu Thr Ala 

1765 1770 1775 

Ser Gly Leu Arg Pro Ala Asp Val Asp Met Val Glu Ala His Gly Thr 

1780 1785 1790 

Gly Thr Glu Leu Gly Asp Pro He Glu Ala Gly Ala Leu He Ala Ala 

1795 1800 1805 

Tyr Gly Arg Asp Arg Asp Arg Pro Leu Trp Leu Gly Ser Val Lys Thr 

1810 1815 1820 

Asn He Gly His Thr Gin Ala Ala Ala Gly Ala Ala Gly Val lie Lys 
1825 1830 1835 1840 

Ala Val Leu Ala Met Arg His Gly Val Leu Pro Arg Ser Leu His Ala 

1845 1850 1855 

Asp Glu Leu Ser Pro His He Asp Trp Ala Asp Gly Lys Val Glu Val 

1860 1865 1870 
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Leu Arg Glu Ala Arg Gin Trp Pro Pro Gly Glu Arg Pro Arg Arg Ala 

1875 1880 1885 

Gly Val Ser Ser Phe Gly Val Ser Gly Thr Asn Ala His Val lie Val 

1890 1895 1900 

Glu Glu Ala Pro Ala Glu Pro Asp Pro Glu Pro Val Pro Ala Ala Pro 
1905 1910 1915 1920 

Gly Gly Pro Leu Pro Phe Val Leu His Gly Arg Ser Val Gin Thr Val 

1925 1930 1935 

Arg Ser Gin Ala Arg Thr Leu Ala Glu His Leu Arg Thr Thr Gly His 

1940 1945 1950 

Arg Asp Leu Ala Asp Thr Ala Arg Thr Leu Ala Thr Gly Arg Ala Arg 

1955 1960 1965 

Phe Asp Val Arg Ala Ala Val Leu Gly Thr Asp Arg Glu Gly Val Cys 

1970 1975 1980 

Ala Ala Leu Asp Ala Leu Ala Gin Asp Arg Pro Ser Pro Asp Val Val 
1985 1990 1995 2000 

Ala Pro Ala Val Phe Ala Ala Arg Thr Pro Val Leu Val Phe Pro Gly 

2005 2010 2015 

Gin Gly Ser Gin Trp Val Gly Met Ala Arg Asp Leu Leu Asp Ser Ser 

2020 2025 2030 

Glu Val Phe Ala Glu Ser Met Gly Arg Cys Ala Glu Ala Leu Ser Pro 

2035 2040 2045 

Tyr Thr Asp Trp Asp Leu Leu Asp Val Val Arg Gly Val Gly Asp Pro 

2050 2055 2060 

Asp Pro Tyr Asp Arg Val Asp Val Leu Gin Pro Val Leu Phe Ala Val 
2065 2070 2075 2080 

Met Val Ser Leu Ala Arg Leu Trp Gin Ser Tyr Gly Val Thr Pro Gly 

2085 2090 2095 

Ala Val Val Gly His Ser Gin Gly Glu lie Ala Ala Ala His Val Ala 

2100 2105 2110 

Gly Ala Leu Ser Leu Ala Asp Ala Ala Arg Val Val Ala Leu Arg Ser 

2115 2120 2125 

Arg Val Leu Arg Glu Leu Asp Asp Gin Gly Gly Met Val Ser Val Gly 

2130 2135 2140 

Thr Ser Arg Ala Glu Leu Asp Ser Val Leu Arg Arg Trp Asp Gly Arg 
2145 2150 2155 2160 

Val Ala Val Ala Ala Val Asn Gly Pro Gly Thr Leu Val Val Ala Gly 

2165 2170 2175 

Pro Thr Ala Glu Leu Asp Glu Phe Leu Ala Val Ala Glu Ala Arg Glu 

2180 2185 2190 

Met Arg Pro Arg Arg lie Ala Val Arg Tyr Ala Ser His Ser Pro Glu 

2195 2200 2205 

Val Ala Arg Val Glu Gin Arg Leu Ala Ala Glu Leu Gly Thr Val Thr 

2210 2215 2220 

Ala Val Gly Gly Thr Val Pro Leu Tyr Ser Thr Ala Thr Gly Asp Leu 
2225 2230 2235 2240 

Leu Asp Thr Thr Ala Met Asp Ala Gly Tyr Trp Tyr Arg Asn Leu Arg 

2245 2250 2255 

Gin Pro Val Leu Phe Glu His Ala Val Arg Ser Leu Leu Glu Arg Gly 

2260 2265 2270 

Phe Glu Thr Phe lie Glu Val Ser Pro His Pro Val Leu Leu Met Ala 

2275 2280 2285 

Val Glu Glu Thr Ala Glu Asp Ala Glu Arg Pro Val Thr Gly Val Pro 

2290 2295 2300 

Thr Leu Arg Arg Asp His Asp Gly Pro Ser Glu Phe Leu Arg Asn Leu 
2305 2310 2315 2320 

Leu Gly Ala His Val His Gly Val Asp Val Asp Leu Arg Pro Ala Val 

2325 2330 2335 

Ala His Gly Arg Leu Val Asp Leu Pro Thr Tyr Pro Phe Asp Arg Gin 

2340 2345 2350 

Arg Leu Trp Pro Lys Pro His Arg Arg Ala Asp Thr Ser Ser Leu Gly 
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2355 2360 . 2365 

Val Arg Asp Ser Thr His Pro Leu Leu His Ala Ala Val Asp Val Pro 

2370 2375 2380 

Gly His Gly Gly Ala Val Phe Thr Gly Arg Leu Ser Pro Asp Glu Gin 
2385 2390 2395 2400 

Gin Trp Leu Thr Gin His Val Val Gly Gly Arg Asn Leu Val Pro Gly 

2405 2410 2415 

Ser Val Leu Val Asp Leu Ala Leu Thr Ala Gly Ala Asp Val Gly Val 

2420 2425 2430 

Pro Val Leu Glu Glu Leu Val Leu Gin Gin Pro Leu Val Leu Thr Ala 

2435 2440 2445 

Ala Gly Ala Leu Leu Arg Leu Ser Val Gly Ala Ala Asp Glu Asp Gly 

2450 2455 2460 

Arg Arg Pro Val Glu lie His Ala Ala Glu Asp Val Ser Asp Pro Ala 
2465 2470 2475 2480 

Glu Ala Arg Trp Ser Ala Tyr Ala Thr Gly Thr Leu Ala Val Gly Val 

2485 2490 2495 

Ala Gly Gly Gly Arg Asp Gly Thr Gin Trp Pro Pro Pro Gly Ala Thr 

2500 2505 2510 

Ala Leu Thr Leu Thr Asp His Tyr Asp Thr Leu Ala Glu Leu Gly Tyr 

2515 2520 2525 

Glu Tyr Gly Pro Ala Phe Gin Ala Leu Arg Ala Ala Trp Gin His Gly 

2530 2535 2540 

Asp Val Val Tyr Ala Glu Val Ser Leu Asp Ala Val Glu Glu Gly Tyr 
2545 2550 2555 2560 

Ala Phe Asp Pro Val Leu Leu Asp Ala Val Ala Gin Thr Phe Gly Leu 

2565 2570 2575 

Thr Ser Arg Ala Pro Gly Lys Leu Pro Phe Ala Trp Arg Gly Val Thr 

2580 2585 2590 

Leu His Ala Thr Gly Ala Thr Ala Val Arg Val Val Ala Thr Pro Ala 

2595 2600 2605 

Gly Pro Asp Ala Val Ala Leu Arg Val Thr Asp Pro Thr Gly Gin Leu 

2610 2615 2620 

Val Ala Thr Val Asp Ala Leu Val Val Arg Asp Ala Gly Ala Asp Arg 
2625 2630 2635 2640 

Asp Gin Pro Arg Gly Arg Asp Gly Asp Leu His Arg Leu Glu Trp Val 

2645 2650 2655 

Arg Leu Ala Thr Pro Asp Pro Thr Pro Ala Ala Val Val His Val Ala 

2660 2665 2670 

Ala Asp Gly Leu Asp Asp Leu Leu Arg Ala Gly Gly Pro Ala Pro Gin 

2675 2680 2685 

Ala Val Val Val Arg Tyr Arg Pro Asp Gly Asp Asp Pro Thr Ala Glu 

2690 2695 2700 

Ala Arg His Gly Val Leu Trp Ala Ala Thr Leu Val Arg Arg Trp Leu 
2705 2710 2715 2720 

Asp Asp Asp Arg Trp Pro Ala Thr Thr Leu Val Val Ala Thr Ser Ala 

2725 2730 2735 

Gly Val Glu Val Ser Pro Gly Asp Asp Val Pro Arg Pro Gly Ala Ala 

2740 2745 2750 

Ala Val Trp Gly Val Leu Arg Cys Ala Gin Ala Glu Ser Pro Asp Arg 

2755 2760 2765 

Phe Val Leu Val Asp Gly Asp Pro Glu Thr Pro Pro Ala Val Pro Asp 

2770 2775 2780 

Asn Pro Gin Leu Ala Val Arg Asp Gly Ala Val Phe Val Pro Arg Leu 
2785 2790 2795 2800 

Thr Pro Leu Ala Gly Pro Val Pro Ala Val Ala Asp Arg Ala Tyr Arg 

2805 2810 2815. 

Leu Val Pro Gly Asn Gly Gly Ser lie Glu Ala Val Ala Phe Ala Pro 

2820 2825 2830 

Val Pro Asp Ala Asp Arg Pro Leu Ala Pro Glu Glu Val Arg Val Ala 
2835 2840 2845 
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Val Arg Ala Thr Gly Val Asn Phe Arg Asp Val Leu Leu Ala Leu Gly 

2850 2855 2860 

Met Tyr Pro Glu Pro Ala Glu Met Gly Thr Glu Ala Ser Gly Val Val 
2865 2870 2875 2880 

Thr Glu Val Gly Ser Gly Val Arg Arg Phe Thr Pro Gly Gin Ala Val 

2885 2890 2895 

Thr Gly Leu Phe Gin Gly Ala Phe Gly Pro Val Ala Val Ala Asp His 

2900 2905 2910 

Arg Leu Leu Thr Pro Val Pro Asp Gly Trp Arg Ala Val Asp Ala Ala 

2915 2920 2925 

Ala Val Pro lie Ala Phe Thr Thr Ala His Tyr Ala Leu His Asp Leu 

2930 2935 2940 

Ala Gly Leu Gin Ala Gly Gin Ser Val Leu Val His Ala Ala Ala Gly 
2945 2950 2955 2960 

Gly Val Gly Met Ala Ala Val Ala Leu Ala Arg Arg Ala Gly Ala Glu 

2965 2970 2975 

Val Phe Ala Thr Ala Ser Pro Ala Lys His Pro Thr Leu Arg Ala Leu 

2980 2985 2990 

Gly Leu Asp Asp Asp His lie Ala Ser Ser Arg Glu Ser Gly Phe Gly 

2995 3000 3005 

Glu Arg Phe Ala Ala Arg Thr Gly Gly Arg Gly Val Asp Val Val Leu 

3010 3015 3020 

Asn Ser Leu Thr Gly Asp Leu Leu Asp Glu Ser Ala Arg Leu Leu Ala 
3025 3030 3035 3040 

Asp Gly Gly Val Phe Val Glu Met Gly Lys Thr Asp Leu Arg Pro Ala 

3045 3050 3055 

Glu Gin Phe Arg Gly Arg Tyr Val Pro Phe Asp Leu Ala Glu Ala Gly 

3060 3065 3070 

Pro Asp Arg Leu Gly Glu He Leu Glu Glu Val Val Gly Leu Leu Ala 

3075 3080 3085 

Ala Gly Ala Leu Asp Arg Leu Pro Val Ser Val Trp Glu Leu Ser Ala 

3090 * 3095 3100 

Ala Pro Ala Ala Leu Thr His Met Ser Arg Gly Arg His Val Gly Lys 
3105 3110 3115 3120 

Leu Val Leu Thr Gin Pro Ala Pro Val His Pro Asp Gly Thr Val Leu 

3125 3130 3135 

Val Thr Gly Gly Thr Gly Thr Leu Gly Arg Leu Val Ala Arg His Leu 

3140 -3145 3150 

Val Thr Gly His Gly Val Pro His Leu Leu Val Ala Ser Arg Arg Gly 

3155 3160 3165 

Pro Ala Ala Pro Gly Ala Ala Glu Leu Arg Ala Asp Val Glu Gly Leu 

3170 3175 3180 

Gly Ala Thr He Glu He Val Ala Cys Asp Thr Ala Asp Arg Glu Ala 
3185 3190 3195 3200 

Leu Ala Ala Leu Leu Asp Ser He Pro Ala Asp Arg Pro Leu Thr Gly 

3205 3210 3215 

Val Val His Thr Ala Gly Val Leu Ala Asp Gly Leu Val Thr Ser He 

3220 3225 3230 

Asp Gly Thr Ala Thr Asp Gin Val Leu Arg Ala Lys Val Asp Ala Ala 

3235 3240 3245 

Trp His Leu His Asp Leu Thr Arg Asp Ala Asp Leu Ser Phe Phe Val 

3250 3255 3260 

Leu Phe Ser Ser Ala Ala Ser Val Leu Ala Gly Pro Gly Gin Gly Val 
3265 3270 3275 3280 

Tyr Ala Ala Ala Asn Gly Val Leu Asn Ala Leu Ala Gly Gin Arg Arg 

3285 3290 3295 

Ala Leu Gly Leu Pro Ala Lys Ala Leu Gly Trp Gly Leu Trp Ala Gin 

3300 3305 3310 

Ala Ser Glu Met Thr Ser Gly Leu Gly Asp Arg He Ala Arg Thr Gly 

3315 3320 3325 

Val Ala Ala Leu Pro* Thr Glu Arg Ala Leu Ala Leu Phe Asp Ala Ala 
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3330 3335 3340 

Leu Arg Ser Gly Gly Glu Val Leu Phe Pro Leu Ser Val Asp Arg Ser 
3345 3350 3355 3360 

Ala Leu Arg Arg Ala Glu Tyr Val Pro Glu Val Leu Arg Gly Ala Val 

3365 3370 3375 

Arg Ser Thr Pro Arg Ala Ala Asn Arg Ala Glu Thr Pro Gly Arg Gly 

3380 3385 3390 

Leu Leu Asp Arg Leu Val Gly Ala Pro Glu Thr Asp Gin Val Ala Ala 

3395 3400 3405 

Leu Ala Glu Leu Val Arg Ser His Ala Ala Ala Val Ala Gly Tyr Asp 

3410 3415 3420 

Ser Ala Asp Gin Leu Pro Glu Arg Lys Ala Phe Lys Asp Leu Gly Phe 
3425 3430 3435 3440 

Asp Ser Leu Ala Ala Val Glu Leu Arg Asn Arg Leu Gly Val Thr Thr 

3445 3450 3455 

Gly Val Arg Leu Pro Ser Thr Leu Val Phe Asp His Pro Thr Pro Leu 

3460 3465 3470 

Ala Val Ala Glu His Leu Arg Ser Glu Leu Phe Ala Asp Ser Ala Pro 

3475 3480 3485 

Asp Val Gly Val Gly Ala Arg Leu Asp Asp Leu Glu Arg Ala Leu Asp 

3490 3495 3500 

Ala Leu Pro Asp Ala Gin Gly His Ala Asp Val Gly Ala Arg Leu Glu 
3505 3510 3515 3520 

Ala Leu Leu Arg Arg Trp Gin Ser Arg Arg Pro Pro Glu Thr Glu Pro 

3525 3530 3535 

Val Thr lie Ser Asp Asp Ala Ser Asp Asp Glu Leu Phe Ser Met Leu 

3540 3545 3550 

Asp Arg Arg Leu Gly Gly Gly Gly Asp Val 
3555 3560 



<210> 15 
<211> 3201 ' 
<212> PRT 

<213> Micromonospora megalomicea 
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Met 


Ser 


Glu 


Ser 


Ser 


Gly 


Met 


Thr 


Glu 


Asp 


Arg 


Leu 


Arg 


Arg 


Tyr 


Leu 
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Lys 


Arg 


Thr 


Val 
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Glu 


Leu 


Asp 


Ser 


Val 


Thr 


Gly 


Arg 


Leu 


Asp 


Glu 








20 










25 
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Val 


Glu 


Tyr 


Arg 


Ala 


Arg 


Glu 


Pro 


He 


Ala 


Val 


Val 


Gly 


Met 


Ala 


Cys 
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Arg 


Phe 


Pro 


Gly 


Gly 


Val 


Asp 


Ser 


Pro 


Glu 


Ala 


Phe 


Trp 


Glu 


Phe 


He 




50 










55 










60 










Arg 


Asp 


Gly 


Gly 


Asp 


Ala 


He 


Ala 


Glu 


Ala 


Pro 


Thr 


Asp 


Arg 


Gly 


Trp 


65 










70 










75 










80 


Pro 


Pro 


Ala 


Pro 


Arg 


Pro 


Arg 


Leu 


Gly 


Gly 


Leu 


Leu 


Ala 


Glu 


Pro 


Gly 










85 










90 










95 




Ala 


Phe 


Asp 


Ala 


Ala 


Phe 


Phe 


Gly 


He 


Ser 


Pro 


Arg 


Glu 


Ala 


Leu 


Ala 








100 










105 










110 






Thr 


Asp 


Pro 


Gin 


Gin 


Arg 


Leu 


Met 


Leu 


Glu 


He 


Ser 


Trp 


Glu 


Ala 


Leu 
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120 
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Glu 


Arg 


Ala 


Gly 


Phe 


Asp 


Pro 


Ser 


Ser 


Leu 


Arg 


Gly 


Ser 


Ala 


Gly 


Gly 
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140 










Val 


Phe 


Thr 


Gly 


Val 


Gly 


Ala 


Val 


Asp 


Tyr 


Gly 


Pro 


Arg 


Pro 


Asp 


Glu 
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150 
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Ala 


Pro 


Glu 


Glu 


Val 


Leu 


Gly 


Tyr 


Val 


Gly 


He 


Gly 


Thr 


Ala 


Ser 


Ser 
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175 




Val 


Ala 


Ser 


Gly 


Arg 


Val 


Ala 


Tyr 


Thr 


Leu 


Gly 


Leu 


Glu 


Gly 


Pro 


Ala 
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Val 


Thr 


Val 


Asp 


Thr 


Ala 


Cys 


Ser 


Ser 


Gly 


Leu 


Thr 


Ala 


Val 


His 


Leu 
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195 200 205 

Ala Met Glu Ser Leu Arg Arg Asp Glu Cys Thr Leu Val Leu Ala Gly 

210 215 220 

Gly Val Thr Val Met Ser Ser Pro Gly Ala Phe Thr Glu Phe Arg Ser 
225 230 235 240 

Gin Gly Gly Leu Ala Glu Asp Gly Arg Cys Lys Pro Phe Ser Arg Ala 

245 250 255 

Ala Asp Gly Phe Gly Leu Ala Glu Gly Ala Gly Val Leu Val Leu Gin 

260 265 270 

Arg Leu Ser Val Ala Arg Ala Glu Gly Arg Pro Val Leu Ala Val Leu 

275 280 285 

Arg Gly Ser Ala lie Asn Gin Asp Gly Ala Ser Asn Gly Leu Thr Ala 

290 295 300 

Pro Ser Gly Pro Ala Gin Arg Arg Val lie Arg Gin Ala Leu Glu Arg 
305 310 315 320 

Ala Arg Leu Arg Pro Val Asp Val Asp Tyr Val Glu Ala His Gly Thr 

325 330 335 

Gly Thr Arg Leu Gly Asp Pro lie Glu Ala His Ala Leu Leu Asp Thr 

340 345 350 

Tyr Gly Ala Asp Arg Glu Pro Gly Arg Pro Leu Trp Val Gly Ser Val 

355 360 365 

Lys Ser Asn lie Gly His Thr Gin Ala Ala Ala Gly Val Ala Gly Val 

370 375 380 

Met Lys Thr Val Leu Ala Leu Arg His Arg Glu lie Pro Ala Thr Leu 
385 390 395 400 

His Phe Asp Glu Pro Ser Pro His Val Asp Trp Asp Arg Gly Ala Val 

.405 410 415 

Ser Val Val Ser Glu Thr Arg Pro Trp Pro Val Gly Glu Arg Pro Arg 

420 425 430 

Arg Ala Gly Val Ser Ser Phe Gly lie Ser Gly Thr Asn Ala His Val 

435 440 445 

He Val Glu* Glu Ala Pro Ser Pro Gin Ala Ala Asp Leu Asp Pro Thr 

450 455 460 

Pro Gly Pro Ala Thr Gly Ala Thr Pro Gly Thr Asp Ala Ala Pro Thr 
465 470 475 480 

Ala Glu Pro Gly Ala Glu Ala Val Ala Leu Val Phe Ser Ala Arg Asp 

485 490 495 

Glu Arg Ala Leu Arg Ala Gin Ala Ala Arg Leu Ala Asp Arg Leu Thr 

500 505 510 

Asp Asp Pro Ala Pro Ser Leu Arg Asp Thr Ala Phe Thr Leu Val Thr 

515 520 525 

Arg Arg Ala Thr Trp Glu His Arg Ala Val Val Val Gly Gly Gly Glu 

530 535 540 

Glu Val Leu Ala Gly Leu Arg Ala Val Ala Gly Gly Arg Pro Val Asp 
545 550 555 560 

Gly Ala Val Ser Gly Arg Ala Arg Ala Gly Arg Arg Val Val Leu Val 

565 570 575 

Phe Pro Gly Gin Gly Ala Gin Trp Gin Gly Met Ala Arg Asp Leu Leu 

580 585 590 

Arg Gin Ser Pro Thr Phe Ala Glu Ser He Asp Ala Cys Glu Arg Ala 

595 600 605 

Leu Ala Pro His Val Asp Trp Ser Leu Arg Glu Val Leu Asp Gly Glu 

610 615 620 

Gin Ser Leu Asp Pro Val Asp Val Val Gin Pro Val Leu Phe Ala Val 
625 630 635 640 

Met Val Ser Leu Ala Arg Leu Trp Gin Ser Tyr Gly Val Thr Pro Gly 

645 650 655. 

Ala Val Val Gly His Ser Gin Gly Glu He Ala Ala Ala His Val Ala 

660 665 670 

Gly Ala Leu Ser Leu Ala Asp Ala Ala Arg Val Val Ala Leu Arg Ser 
675 680 685 
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Arg 
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Arg 


Arg 
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Gly 


Gly 
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Gly 


Gly 


Met 
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Gly 
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His 


Pro 


Asp 
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Ala 


Glu 


Arg 


He 
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Arg 


Phe 


Ala 


Gly 


Ala 


705 
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720 


Leu 


Thr 


Val 


Ala 


Ser 


Val 


Asn 


Gly 


Pro 


Arg 


Ser 


Val 


Val 


Leu 


Ala 


Gly 










725 










7 30 










735 


Glu 


Asn 


Gly 


Pro 


Leu 


Asp 


Glu 


Leu 


He 


Ala 


Glu 


Cys 


Glu 


Ala 


Glu 


Gly 








740 










745 










750 




Val 


Thr 


Ala 
755 


Arg 


Arg 


He 


Pro 


Val 
7 60 


Asp 


Tyr 


Ala 


Ser 


His 
765 


Ser 

* 


Pro 


Gin 


Val 


Glu 


Ser 


Leu 


Arg 


Glu 


Glu 


Leu 


Leu 


Ala 


Ala 


Leu 


Ala 


Gly 


Val 


Arg 
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775 










780 








Pro 


Val 


Ser 


Ala 


Gly 


He 


Pro 


Leu 


Tyr 


Ser 


Thr 


Leu 


Thr 


Gly 


Gin 


Val 


785 










790 










795 










800 


He 


Glu 


Thr 


Ala 


Thr 
805 


Met 


Asp 


Ala 


Asp 


Tyr 
810 


Trp 


Phe 


Ala 


Asn 


Leu 
815 


Arg 


Glu 


Pro 


Val 


Arg 


Phe 


Gin 


Asp 


Ala 


Thr 


Arg 


Gin 


Leu 


Ala 


Glu 


Ala 


Gly 








820 










825 










830 




Phe 


Asp 


Ala 


Phe 


Val 


Glu 


Val 


Ser 


Pro 


His 


Pro 


Val 


Leu 


Thr 


Val 


Gly 






835 










840 










845 






Val 


Glu 


Ala 


Thr 


Leu 


Glu 


Ala 


Val 


Leu 


Pro 


Pro 


Asp 


Ala 


Asp 


Pro 


Cys 




850 










855 










860 








Val 


Thr 


Gly 


Thr 


Leu 


Arg 


Arg 


Glu 


Arg 


Gly 


Gly 


Leu 


Ala 


Gin 


Phe 


His 


865 










870 










875 










880 


Thr 


Ala 


Leu 


Ala 


Glu 
885 


Ala 


Tyr 


Thr 


Arg 


Gly 
890 


Val 


Glu 


Val 


Asp 


Trp 
895 


Arg 


Thr 


Ala 


Val 


Gly 
900 


Glu 


Gly 


Arg 


Pro 


Val 
905 


Asp 


Leu 


Pro 


Val 


Tyr 
910 


Pro 


Phe 


Gin 


Arg 


Gin 
915 


Asn 


Phe 


Trp 


Leu 


Pro 
920 


Val 


Pro 


Leu 


Gly 


Arg 
925 


Val 


Pro 


Asp 


Thr 


Gly 
930 


Asp 

* 


Glu 


Trp 


Arg 


Tyr 
935 


Gin 


Leu 


Ala 


Trp 


His 
940 


Pro 


Val 


Asp 


Leu 


Gly 


Arg 


Ser 


Ser 


Leu 


Ala 


Gly 


Arg 


Val 


Leu 


val 


Val 


Thr 


Gly 


Ala 


Ala 


945 










950 










955 










960 


Val 


Pro 


Pro 


Ala 


Trp 
965 


Thr 


Asp 


Val 


Val 


Arg 
970 


Asp 


Gly 


Leu 


Glu 


Gin 
975 


Arg 


Gly 


Ala 


Thr 


Val 
980 


Val 


Leu 


Cys 


Thr 


Ala 
985 


Gin 


Ser 


Arg 


Ala 


Arg 
990 


lie 


Gly 


Ala 


Ala 


Leu 


Asp 


Ala 


Val 


Asp 


Gly 


Thr 


Ala 


Leu 


Ser 


Thr 


Val 


Val 


Ser 






995 










1000 








1005 






Leu 


Leu 


Ala 


Leu 


Ala 


Glu 


Gly 


Gly Ala 


Val 


Asp 


Asp 


Pro 


Ser 


Leu 


Asp 




1010 








1015 








1020 








Thr 


Leu 


Ala 


Leu 


Val 


Gin 


Ala 


Leu 


Gly 


Ala 


Ala 


Gly 


lie 


Asp 


Val 


Pro 


1025 








1030 








1035 








104 


Leu 


Trp 


Leu 


Val 


Thr 


Arg 


Asp 


Ala 


Ala 


Ala 


Val 


Thr 


Val 


Gly 


Asp 


Asp 










1045 








1050 








1055 


Val 


Asp 


Pro 


Ala 


Gin 


Ala 


Met 


Val 


Gly 


Gly 


Leu 


Gly 


Arg 


Val 


Val 


Gly 



1060 1065 1070 



Val Glu Ser Pro Ala Arg Trp Gly Gly Leu Val Asp Leu Arg Glu Ala 

1075 1080 1085 

Asp Ala Asp Ser Ala Arg Ser Leu Ala Ala He Leu Ala Asp Pro Arg 

1090 1095 1100 

Gly Glu Glu Gin Phe Ala He Arg Pro Asp Gly Val Thr Val Ala Arg 
1105 1110 1115 1120 

Leu Val Pro Ala Pro Ala Arg Ala Ala Gly Thr Arg Trp Thr Pro Arg 

1125 1130 1135 

Gly Thr Val Leu Val Thr Gly Gly Thr Gly Gly He Gly Ala His Leu 

1140 1145 1150 

Ala Arg Trp Leu Ala Gly Ala Gly Ala Glu His Leu Val Leu Leu Asn 

1155 1160 1165 

Arg Arg Gly Ala Glu Ala Ala Gly Ala Ala Asp Leu Arg Asp Glu Leu 
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1170 1175 1180 

Val Ala Leu Gly Thr Gly Val Thr lie Thr Ala Cys Asp Val Ala Asp 
1185 1190 1195 1200 

Arg Asp Arg Leu Ala Ala Val Leu Asp Ala Ala Arg Ala Gin Gly Arg 

1205 1210 1215 

Val Val Thr Ala Val Phe His Ala Ala Gly lie Ser Arg Ser Thr Ala 

1220 1225 1230 

Val Gin Glu Leu Thr Glu Ser Glu Phe Thr Glu lie Thr Asp Ala Lys 

1235 1240 1245 

Val Arg Gly Thr Ala Asn Leu Ala Glu Leu Cys Pro Glu Leu Asp Ala 

1250 1255 1260 

Leu Val Leu Phe Ser Ser Asn Ala Ala Val Trp Gly Ser Pro Gly Leu 
1265 1270 1275 1280 

Ala Ser Tyr Ala Ala Gly Asn Ala Phe Leu Asp Ala Phe Ala Arg Arg 

1285 1290 1295 

Gly Arg Arg Ser Gly Leu Pro Val Thr Ser lie Ala Trp Gly Leu Trp 

1300 1305 1310 

Ala Gly Gin Asn Met Ala Gly Thr Glu Gly Gly Asp Tyr Leu Arg Ser 

1315 1320 1325 

Gin Gly Leu Arg Ala Met Asp Pro Gin Arg Ala lie Glu Glu Leu Arg 

1330 1335 1340 

Thr Thr Leu Asp Ala Gly Asp Pro Trp Val Ser Val Val Asp Leu Asp 
1345 1350 1355 1360 

Arg Glu Arg Phe Val Glu Leu Phe Thr Ala Ala Arg Arg Arg Pro Leu 

1365 1370 1375 

Phe Asp Glu Leu Gly Gly Val Arg Ala Gly Ala Glu Glu Thr Gly Gin 

1380 1385 1390 

Glu Ser Asp Leu Ala Arg Arg Leu Ala Ser Met Pro Glu Ala Glu Arg 

1395 1400 1405 

His Glu His Val Ala Arg Leu Val Arg Ala Glu Val Ala Ala Val Leu 

1410 1415 1420 

Gly His Gly' Thr Pro Thr Val Tie Glu Arg Asp Val Ala Phe Arg Asp 
1425 1430 1435 1440 

Leu Gly Phe Asp Ser Met Thr Ala Val Asp Leu Arg Asn Arg Leu Ala 

1445 1450 1455 

Ala Val Thr Gly Val Arg Val Ala Thr Thr lie Val Phe Asp His Pro 

1460 1465 1470 

Thr Val Asp Arg Leu Thr Ala His Tyr Leu Glu Arg Leu Val Gly Glu 

1475 1430 1485 

Pro Glu Ala Thr Thr Pro Ala Ala Ala Val Val Pro Gin Ala Pro Gly 

1490 1495 1500 

Glu Ala Asp Glu Pro lie Ala lie Val Gly Met Ala Cys Arg Leu Ala 
1505 1510 1515 1520 

Gly Gly Val Arg Thr Pro Asp Gin Leu Trp Asp Phe lie Val Ala Asp 

1525 1530 1535 

Gly Asp Ala Val Thr Glu Met Pro Ser Asp Arg Ser Trp Asp Leu Asp 

1540 1545 1550 

Ala Leu Phe Asp Pro Asp Pro Glu Arg His Gly Thr Ser Tyr Ser Arg 

1555 1560 1565 

His Gly Ala Phe Leu Asp Gly Ala Ala Asp Phe Asp Ala Ala Phe Phe 

1570 1575 1580 

Gly lie Ser Pro Arg Glu Ala Leu Ala Met Asp Pro Gin Gin Arg Gin 
1585 1590 1595 1600 

Val Leu Glu Thr Thr Trp Glu Leu Phe Glu Asn Ala Gly lie Asp Pro 

1605 1610 1615 

His Ser Leu Arg Gly Thr Asp Thr Gly Val Phe Leu Gly Ala Ala Tyr 

1620 1625 1630 

Gin Gly Tyr Gly Gin Asn Ala Gin Val Pro Lys Glu Ser Glu Gly Tyr 

1635 1640 1645 

Leu Leu Thr Gly Gly Ser Ser Ala Val Ala Ser Gly Arg lie Ala Tyr 
1650 1655 1660 
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Val Leu Gly Leu Glu Gly Pro Ala lie Thr Val Asp Thr Ala Cys Ser 
1665 1670 1675 1680 

Ser Ser Leu Val Ala Leu His Val Ala Ala Gly Ser Leu Arg Ser Gly 

1685 1690 1695 

Asp Cys Gly Leu Ala Val Ala Gly Gly Val Ser Val Met Ala Gly Pro 

1700 1705 1710 

Glu Val Phe Thr Glu Phe Ser Arg Gin Gly Ala Leu Ala Pro Asp Gly 

1715 1720 1725 

Arg Cys Lys Pro Phe Ser Asp Gin Ala Asp Gly Phe Gly Phe Ala Glu 

1730 1735 1740 

Gly Val Ala Val Val Leu Leu Gin Arg Leu Ser Val Ala Val Arg Glu 
1745 1750 1755 1760 

Gly Arg Arg Val Leu Gly Val Val Val Gly Ser Ala Val Asn Gin Asp 

1765 1770 1775 

Gly Ala Ser Asn Gly Leu Ala Ala Pro Ser Gly Val Ala Gin Gin Arg 

1780 1785 1790 

Val lie Arg Arg Ala Trp Gly Arg Ala Gly Val Ser Gly Gly Asp Val 

1795 1800 1805 

Gly Val Val Glu Ala His Gly Thr Gly Thr Arg Leu Gly Asp Pro Val 

1810 1815 1820 

Glu Leu Gly Ala Leu Leu Gly Thr Tyr Gly Val Gly Arg Gly Gly Val 
1825 . 1830 1835 1840 

Gly Pro Val Val Val Gly Ser Val Lys Ala Asn Val Gly His Val Gin 

1845 1850 1855 

Ala Ala Ala Gly Val Val Gly Val He Lys Val Val Leu Gly Leu Gly 

1860 1865 1870 

Arg Gly Leu Val Gly Pro Met Val Cys Arg Gly Gly Leu Ser Gly Leu 

1875 1880 1885 

Val Asp Trp Ser Ser Gly Gly Leu Val Val Ala Asp Gly Val Arg Gly 

1890 1895 1900 

Trp Pro Val Gly Val Asp Gly Val Arg Arg Gly Gly Val Ser Ala Phe 
1905 ' 1910 1915 1920 

Gly Val Ser Gly Thr Asn Ala His Val Val Val Ala Glu Ala Pro Gly 

1925 1930 1935 

Ser Val Val Gly Ala Glu Arg Pro Val Glu Gly Ser Ser Arg Gly Leu 

1940 1945 1950 

Val Gly Val Ala Gly Gly Val Val Pro Val Val Leu Ser Ala Lys Thr 

1955 1960 1965 

Glu Thr Ala Leu Thr Glu Leu Ala Arg Arg Leu His Asp Ala Val Asp 

1970 1975 1980 

Asp Thr Val Ala Leu Pro Ala Val Ala Ala Thr Leu Ala Thr Gly Arg 
1985 1990 1995 2000 

Ala His Leu Pro Tyr Arg Ala Ala Leu Leu Ala Arg Asp His Asp Glu 

2005 2010 2015 

Leu Arg Asp Arg Leu Arg Ala Phe Thr Thr Gly Ser Ala Ala Pro Gly 

2020 2025 2030 

Val Val Ser Gly Val Ala Ser Gly Gly Gly Val Val Phe Val Phe Pro 

2035 2040 2045 

Gly Gin Gly Gly Gin Trp Val Gly Met Ala Arg Gly Leu Leu Ser Val 

2050 2055 2060 

Pro Val Phe Val Glu Ser Val Val Glu Cys Asp Ala Val Val Ser Ser 
2065 2070 2075 2080 

Val Val Gly Phe Ser Val Leu Gly Val Leu Glu Gly Arg Ser Gly Ala 

2085 2090 2095 

Pro Ser Leu Asp Arg Val Asp Val Val Gin Pro Val Leu Phe Val Val 

2100 2105 2110 

Met Val Ser Leu Ala Arg Leu Trp Arg Trp Cys Gly Val Val Pro Ala 

2115 2120 2125 

Ala Val Val Gly His Ser Gin Gly Glu He Ala Ala Ala Val Val Ala 

2130 2135 2140 

Gly Val Leu Ser Val Gly Asp Gly Ala Arg Val Val Ala Leu Arg Ala 
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2145 2150 2155 2160 

Arg Ala Leu Arg Ala Leu Ala Gly His Gly Gly Met Val Ser Leu Ala 

2165 2170 2175 

Val Ser Ala Glu Arg Ala Arg Glu Leu lie Ala Pro Trp Ser Asp Arg 

2180 2185 2190 

lie Ser Val Ala Ala Val Asn Ser Pro Thr Ser Val Val Val Ser Gly 

2195 2200 2205 

Asp Pro Gin Ala Leu Ala Ala Leu Val Ala His Cys Ala Glu Thr Gly 

2210 2215 2220 

Glu Arg Ala Lys Thr Leu Pro Val Asp Tyr Ala Ser His Ser Ala His 
2225 2230 2235 2240 

Val Glu Gin lie Arg Asp Thr lie Leu Thr Asp Leu Ala Asp Val Thr 

2245 2250 2255 

Ala Arg Arg Pro Asp Val Ala Leu Tyr Ser Thr Leu His Gly Ala Arg 

2260 2265 2270 

Gly Ala Gly Thr Asp Met Asp Ala Arg Tyr Trp Tyr Asp Asn Leu Arg 

2275 2280 2285 

Ser Pro Val Arg Phe Asp Glu Ala Val Glu Ala Ala Val Ala Asp Gly 

2290 2295 2300 

Tyr Arg Val Phe Val Glu Met Ser Pro His Pro Val Leu Thr Ala Ala 
2305 2310 2315 2320 

Val Gin Glu lie Asp Asp Glu Thr Val Ala lie Gly Ser Leu His Arg 

2325 2330 2335 

Asp Thr Gly Glu Arg His Leu Val Ala Glu Leu Ala Arg Ala His Val 

2340 2345 2350 

His Gly Val Pro Val Asp Trp Arg Ala He Leu Pro Ala Thr His Pro 

2355 2360 2365 

Val Pro Leu Pro Asn Tyr Pro Phe Glu Ala Thr Arg Tyr Trp Leu Ala 

2370 2375 2380 

Pro Thr Ala Ala Asp Gin Val Ala Asp His Arg Tyr Arg Val Asp Trp 
2385 2390 2395 2400 

Arg Pro Leu' Ala Thr Thr Pro Ala Glu Leu Ser Gly Ser Tyr Leu Val 

2405 2410 2415 

Phe Gly Asp Ala Pro Glu Thr Leu Gly His Ser Val Glu Lys Ala Gly 

2420 2425 2430 

Gly Leu Leu Val Pro Val Ala Ala Pro Asp Arg Glu Ser Leu Ala Val 

2435 2440 2445 

Ala Leu Asp Glu Ala Ala Gly Arg Leu Ala Gly Val Leu Ser Phe Ala 

2450 2455 2460 

Ala Asp Thr Ala Thr His Leu Ala Arg His Arg Leu Leu Gly Glu Ala 
2465 2470 2475 2480 

Asp Val Glu Ala Pro Leu Trp Leu Val Thr Ser Gly Gly Val Ala Leu 

2485 2490 2495 

Asp Asp His Asp Pro He Asp Cys Asp Gin Ala Met Val Trp Gly lie 

2500 2505 2510 

Gly Arg Val Met Gly Leu Glu Thr Pro His Arg Trp Gly Gly Leu Val 

2515 2520 2525 

Asp Val Thr Val Glu Pro Thr Ala Glu Asp Gly Val Val Phe Ala Ala 

2530 2535 254p 

Leu Leu Ala Ala Asp Asp His Glu Asp Gin Val Ala Leu Arg Asp Gly 
2545 2550 2555 2560 

He Arg His Gly Arg Arg Leu Val Arg Ala Pro Leu Thr Thr Arg Asn 

2565 2570 2575 

Ala Arg Trp Thr Pro Ala Gly Thr Ala Leu Val Thr Gly Gly Thr Gly 

2580 2585 2590 

Ala Leu Gly Gly His Val Ala Arg Tyr Leu Ala Arg Ser Gly Val Thr 

2595 2600 2605 

Asp Leu Val Leu Leu Ser Arg Ser Gly Pro Asp Ala Pro Gly Ala Ala 

2610 2615 2620 

Glu Leu Ala Ala Glu Leu Ala Asp Leu Gly Ala Glu Pro Arg Val Glu 
2625 2630 2635 2640 
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Ala Cys Asp Val Thr Asp Gly Pro Arg Leu Arg Ala Leu Val Gin Glu 

2645 2650 2655 

Leu Arg Glu Gin Asp Arg Pro Val Arg lie Val Val His Thr Ala Gly 

2660 2665 2670 

Val Pro Asp Ser Arg Pro Leu Asp Arg lie Asp Glu Leu Glu Ser Val 

2675 2680 2685 

Ser Ala Ala Lys Val Thr Gly Ala Arg Leu Leu Asp Glu Leu Cys Pro 

2690 2695 2700 

Asp Ala Asp Thr Phe Val Leu Phe Ser Ser Gly Ala Gly Val Trp Gly 
2705 2710 2715 2720 

Ser Ala Asn Leu Gly Ala Tyr Ala Ala Ala Asn Ala Tyr Leu Asp Ala 

2725 2730 2735 

Leu Ala His Arg Arg Arg Gin Ala Gly Arg Ala Ala Thr Ser Val Ala 

2740 2745 2750 

Trp Gly Ala Trp Ala Gly Asp Gly Met Ala Thr Gly Asp Leu Asp Gly 

2755 2760 2765 

Leu Thr Arg Arg Gly Leu Arg Ala Met Ala Pro Asp Arg Ala Leu Arg 

2770 2775 2780 

Ala Cys Thr Arg Arg Trp Thr Thr His Asp Thr Cys Val Ser Val Ala 
2785 2790 2795 2800 

Asp Val Asp Trp Asp Arg Phe Ala Val Gly Phe Thr Ala Ala Arg Pro 

2805 2810 2815 

Arg Pro Leu lie Asp Glu Leu Val Thr Ser Ala Pro Val Ala Ala Pro 

2820 2825 2830 

Thr Ala Ala Ala Ala Pro Val Pro Ala Met Thr Ala Asp Gin Leu Leu 

2835 2840 2845 

Gin Phe Thr Arg Ser His Val Ala Ala lie Leu Gly His Gin Asp Pro 

2850 2855 2860 

Asp Ala Val Gly Leu Asp Gin Pro Phe Thr Glu Leu Gly Phe Asp Ser 
2865 2870 2875 2880 

Leu Thr Ala Val Gly Leu Arg Asn Gin Leu Gin Gin Ala Thr Gly Arg 

2885 2890 2895 

Thr Leu Pro Ala Ala Leu Val Phe Gin His Pro Thr Val Arg Arg Leu 

2900 2905 2910 

Ala Asp His Leu Ala Gin Gin Leu Asp Val Gly Thr Ala Pro Val Glu 

2915 2920 2925 

Ala Thr Gly Ser Val Leu Arg Asp Gly Tyr Arg Arg Ala Gly Gin Thr 

2930 2935 2940 

Gly Asp Val Arg Ser Tyr Leu Asp Leu Leu Ala Asn Leu Ser Glu Phe 
2945 2950 2955 2960 

Arg Glu Arg Phe Thr Asp Ala Ala Ser Leu Gly Gly Gin Leu Glu Leu 

2965 2970 2975 

Val Asp Leu Ala Asp Gly Ser Gly Pro Val Thr Val lie Cys Cys Ala 

2980 2985 2990 

Gly Thr Ala Ala Leu Ser Gly Pro His Glu Phe Ala Arg Leu Ala Ser 

2995 3000 3005 

Ala Leu Arg Gly Thr Val Pro Val Arg Ala Leu Ala Gin Pro Gly Tyr 

3010 3015 3020 

Glu Ala Gly Glu Pro Val Pro Ala Ser Met Glu Ala Val Leu Gly Val 
3025 3030 3035 3040 

Gin Ala Asp Ala Val Leu Ala Ala Gin Gly Asp Thr Pro Phe Val Leu 

3045 3050 3055 

Val Gly His Ser Ala Gly Ala Leu Met Ala Tyr Ala Leu Ala Thr Glu 

3060 3065 3070 

Leu Ala Asp Arg Gly His Pro Pro Arg Gly Val Val Leu Leu Asp Val 

3075 3080 3085 

Tyr Pro Pro Gly His Gin Glu Ala Val His Ala Trp Leu Gly Glu Leu 

3090 3095 3100 

Thr Ala Ala Leu Phe Asp His Glu Thr Val Arg Met Asp Asp Thr Arg 
3105 3110 3115 3120 

Leu Thr Ala Leu Gly Ala Tyr Asp Arg Leu Thr Gly Arg Trp Arg Pro 
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3125 3130 3135 

Arg Asp Thr Gly Leu Pro Thr Leu Val Val Ala Ala Ser Glu Pro Met 

3140 3145 3150 

Gly Glu Trp Pro Asp Asp Gly Trp Gin Ser Thr Trp Pro Phe Gly His 

3155 3160 3165 

Asp Arg Val Thr Val Pro Gly Asp His Phe Ser Met Val Gin Glu His 

3170 3175 3180 

Ala Asp Ala lie Ala Arg His He Asp Ala Trp Leu Ser Gly Glu Arg 
3185 3190 3195 3200 

Ala 



<210> 16 
<211> 358 
<212> PRT 

<213> Micromonospora megalomicea 



<400> 16 



Met 


Asn 


Thr 


Thr 


Asp 


Arg 


Ala 


Val 


Leu 


Gly 


Arg 


Arg 


Leu 


Gin 


Met 


He 


1 








5 










10 










15 




Arg 


Gly 


Leu 


Tyr 


Trp 


Gly 


Tyr 


Gly 


Ser 


Asn 


Gly 


Asp 


Pro 


Tyr 


Pro 


Met 








20 










25 










30 






Leu 


Leu 


Cys 


Gly 


His 


Asp 


Asp 


Asp 


Pro 


His 


Arg 


Trp 


Tyr 


Arg 


Gly 


Leu 






35 










40 










45 








Gly 


Gly 


Ser 


Gly 


Val 


Arg 


Arg 


Ser 


Arg 


Thr 


Glu 


Thr 


Trp 


Val 


Val 


Thr 




50 










55 










60 










Asp 


His 


Ala 


Thr 


Ala 


Val 


Arg 


Val 


Leu 


Asp 


Asp 


Pro 


Thr 


Phe 


Thr 


Arg 


65 










70 










75 










80 


Ala 


Thr 


Gly 


Arg 


Thr 


Pro 


Glu 


Trp 


Met 


Arg 


Ala 


Ala 


Gly 


Ala 


Pro 


Ala 










85 










90 










95 




Ser 


Thr 


Trp 


Ala 


Gin 


Pro 


Phe 


Arg 


Asp 


Val 


His 


Ala 


Ala 


Ser 


Trp 


Asp 








100 










105 










110 






Ala 


Glu 


Leu 


Pro 


Asp 


Pro 


Gin 


Glu 


Val 


Glu 


Asp 


Arg 


Leu 


Thr 


Gly 


Leu 






115 










120 










125 








Leu 


Pro 


Ala 


Pro 


Gly 


Thr 


Arg 


Leu 


Asp 


Leu 


Val 


Arg 


Asp 


Leu 


Ala 


Trp 




130 










135 










140 










Pro 


Met 


Ala 


Ser 


Arg 


Gly 


Val 


Gly 


Ala 


Asp 


Asp 


Pro 


Asp 


Val 


Leu 


Arg 


145 










150 










155 










160 


Ala 


Ala 


Trp 


Asp 


Ala 


Arg 


Val 


Gly 


Leu 


Asp 


Ala 


Gin 


Leu 


Thr 


Pro 


Gin 










165 










170 










175 




Pro 


Leu 


Ala 


Val 


Thr 


Glu 


Ala 


Ala 


He 


Ala 


Ala 


Val 


Pro 


Gly 


Asp 


Pro 








180 










185 










190 






His 


Arg 


Arg 


Ala 


Leu 


Phe 


Thr 


Ala 


Val 


Glu 


Met 


Thr 


Ala 


Thr 


Ala 


Phe 






195 










200 










205 








Val 


Asp 


Ala 


Val 


Leu 


Ala 


Val 


Thr 


Ala 


Thr 


Ala 


Gly 


Ala 


Ala 


Gin 


Arg 




210 










215 










220 










Leu 


Ala 


Asp 


Asp 


Pro 


Asp 


Val 


Ala 


Ala 


Arg 


Leu 


Val 


Ala 


Glu 


Val 


Leu 


225 










230 










235 










240 


Arg 


Leu 


His 


Pro 


Thr 


Ala 


His 


Leu 


Glu 


Arg 


Arg 


Thr 


Ala 


Gly 


Thr 


Glu 










245 










250 










255 




Thr 


Val 


Val 


Gly 


Glu 


His 


Thr 


Val 


Ala 


Ala 


Gly 


Asp 


Glu 


Val 


Val 


Val 








260 










265 










270 






Val 


Val 


Ala 


Ala 


Ala 


Asn 


Arg 


Asp 


Ala 


Gly 


Val 


Phe 


Ala 


Asp 


Pro 


Asp 






275 










280 










285 








Arg 


Leu 


Asp 


Pro 


Asp 


Arg 


Ala 


Asp 


Ala 


Asp 


Arg 


Ala 


Leu 


Ser 


Ala 


Gin 




290 










295 










300 










Arg 


Gly 


His 


Pro 


Gly 


Arg 


Leu 


Glu 


Glu 


Leu 


Val 


Val 


Val 


Leu 


Thr. 


Thr 


305 










310 










315 










320 


Ala 


Ala 


Leu 


Arg 


Ser 


Val 


Ala 


Lys 


Ala 


Leu 


Pro 


Gly 


Leu 


Thr 


Ala 


Gly 










325 










330 










335 


Gly 


Pro 


Val 


Val 


Arg 


Arg 


Arg 


Arg 


Ser 


Pro 


Val 


Leu 


Arg 


Ala 


Thr 


Ala 
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340 345 350 

His Cys Pro Val Glu Leu 
355 

<210> 17 
<211> 422 
<212> PRT 

<213> Micromonospora megalomicea 



<400> 17 



Met 


Arg 


Val 


Val 


Pne 


ber 


Ser 


Met 


TV t — 

Ala 


Ser 


Lys 


Ser 


it* 

His 


Leu 


Phe 


Gly 


1 








5 










10 










15 




Leu 


Val 


Pro 


Leu 


Ala 


Trp 


Ala 


Phe 


Arg 


Ala 


Ala 


Gly 


His 


Glu 


Val 


Arg 








20 










25 










30 




Val 


Val 


TV 1 — 

Ala 


Ser 


Pro 


Ala 


Leu 


rnr 


Asp 


Asp 


lie 


Thr 


Ala 


Ala 


Gly 


Leu 






35 










4 0 










45 






Thr 


Ala 


Val 


Pro 


Val 


Gly 


Thr 


Asp 


Val 


« 

Asp 


Leu 


Val 


Asp 


Phe 


Met 


Thr 
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55 










60 










His 


Ala 


Gly 


Tyr 


Asp 


T I _ 

lie 


lie 


ASp 


Tyr 


Val 


Arg 


Ser 


Leu 


Asp 


Phe 


Ser 


65 










/ U 










75 










80 




Arg 


^\ jam. _ ~ 

ASp 


pro 


TV T _ 

Ala 


i nr 


O -V 


i nr 


Trp 


Asp 


His 


Leu 


Leu 


Gly 


Met 


Gin 










o 5 










90 










95 




T K t- 

1 nr 


va i 


.Leu 


T U, v> 

i nr 


pro 


i nr 


pne 


i yr 


Aia 


Leu 


Met 


ber 


T~l 

Pro 


ASp 


Ser 


Leu 








1 uu 










IUd 










110 






val 


CjIU 


(jiy 


rie u 


lie 


Cor 


pne 


f** tin 
i^ys 


Arg 


O y-v -w- 

ber 


Trp 


Arg 


Pro 


Asp 


Trp 


Ser 
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1 1 O 
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1 £ U 
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C i-. 
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Lai y 


pro 


\d± n 


i nr 


rne 


M.1 ca 


ai a 




TT- 


Aia 


TV 1 -i 

Aia 


I nr 


val 


Thr 


Gly 




1 JU 










1 -3D 










1 4 U 








Val 


Ala 


Li'-, 

HIS 


Ala 


Arg 


lieu 


Leu 


Trp 




pro 


TV A n 

Asp 


lie 


Thr 


Val 


Arg 


Ala 


1 4 O 










1 j u 










1 DO 










1 60 


Arg 


vj i n 


Ly s 


pne 


Leu 


oly 


iieu 


iieu 


pro 


vjly 


Gin 


D v ^>t. 

Pro 


Ala 


Ala 


His 


Arg 










1 DD 










1 / u 










1 /o 




pin 


MS p 


pro 


i_>e u 


Aia 




irp 


Leu 


i nr 


i rp 


ber 


vai 


CjIU 


Arg 


pne 


Gly 








i p n 










loo 










i yu 






C 1 \r 
uiy 


rtx. y 


v a X 


rl.U 


olll 




Vpi 1 


ri,i 






Val 


v a i 


oi y 


vjin 


Trp 


j. nr 
















Z U \) 










one. 








T 1 a 

lie 




rr ro 


r-ll a 


ir ro 


V ci 1 


(jiy 






i»eu 


ASp 


i nr 


biy 


Leu 


TV j— 

Arg 


rnr 




91 A 










on c 




















V cl JL 


ol y 


L Jfc: L. 


t\L y 


i yr 


VOX 


Z\ cr t-> 


i yr 


r\o 1 1 


oi y 


pro 


ici 


Val 


v a i 


pro 


ASp 












9 n 










O "5 ^ 










Z 4 U 


i rp 


T an 




nop 


v? -L li 




Thr 


nx. y 




niy 


Val 




Lieu 


i n r 


iieu 


ijiy 










^ ft J 




















£. OO 




lie 

X, J- V— 


Sp r 


Ser 


Ara 


Glu 

\A 


Asn 


Ser 


lie 


Gl v 


Gin 


Val 




VCl J. 


Sen 
n i> 


A cr j— k 
no 


T oi i 
J-it2 U 








260 










265 










270 






Leu 


Gly 


Ala 


Leu 


Gly 


Asp 


Val 


Asp 


Ala 


Glu 


He 


He 


Ala 


Thr 


Val 


Asp 






275 










280 










285 








Glu 


Gin 


Gin 


Leu 


Glu 


Gly 


Val 


Ala 


His 


Val 


Pro 


Ala 


Asn 


He 


Arg 


Thr 




290 










295 










300 










Val 


Gly 


Phe 


Val 


Pro 


Met 


His 


Ala 


Leu 


Leu 


Pro 


Thr 


Cys 


Ala 


Ala 


Thr 


305 










310 










315 










320 


Val 


His 


His 


Gly 


Gly 


Pro 


Gly 


Ser 


Trp 


His 


Thr 


Ala 


Ala 


He 


His 


Gly 










325 










330 










335 




Val 


Pro 


Gin 


Val 


lie 


Leu 


Pro 


Asp 


Gly 


Trp 


Asp 


Thr 


Gly 


Val 


Arg 


Ala 








340 










345 










350 






Gin 


Arg 


Thr 


Glu 


Asp 


Gin 


Gly 


Ala 


Gly 


lie 


Ala 


Leu 


Pro 


Val 


Pro 


Glu 






355 










360 










365 








Leu 


Thr 


Ser 


Asp 


Gin 


Leu 


Arg 


Glu 


Ala 


Val 


Arg 


Arg 


Val 


Leu 


Asp 


Asp 




370 










375 










380 










Pro 


Ala 


Phe 


Thr 


Ala 


Gly 


Ala 


Ala 


Arg 


Met 


Arg 


Ala 


Asp 


Met 


Leu 


Ala 


385 










390 










395 










400 


Glu 


Pro 


Ser 


Pro 


Ala 


Glu 


Val 


Val 


Asp 


Val 


Cys 


Ala 


Gly 


Leu 


Val 


Gly 
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405 410 415 

Glu Arg Thr Ala Val Gly 

420 

<210> 18 
<211> 323 
<212> PRT 

<213> Micromonospora megaloraicea 
<400> 18 



Met 


Ser 


Thr 


Asp 


Ala 


Thr 


His 


Val 


Arg 


Leu 


Gly 


Arg 


Cys 


Ala 


Leu 


Leu 


1 








5 










10 










15 




Thr 


Ser 


Arg 


Leu 


Trp 


Leu 


Gly 


Thr 


Ala 


Ala 


Leu 


Ala 


Gly 


Gin 


Asp 


Asp 








20 










25 










30 






Ala 


Asp 


Ala 


Val 


Arg 


Leu 


Leu 


Asp 


His 


Ala 


Arg 


Ser 


Arg 


Gly 


Val 


Asn 






35 










40 










45 








Cys 


Leu 


Asp 


Thr 


Ala 


Asp 


Asp 


Asp 


Ser 


Ala 


Ser 


Thr 


Ser 


Ala 


Gin 


Val 




50 










55 










60 










Ala 


Glu 


Glu 


Ser 


Val 


Gly 


Arg 


Trp 


Leu 


Ala 


Gly 


Asp 


Thr 


Gly 


Arg 


Arg 


65 










70 










75 










80 


Glu 


Glu 


Thr 


Val 


Leu 


Ser 


Val 


Thr 


Val 


Gly 


Val 


Pro 


Pro 


Gly 


Gly 


Gin 


• 








85 










90 










95 




Val 


Gly 


Gly 


Gly 


Gly 


Leu 


Ser 


Ala 


Arg 


Gin 


He 


He 


Ala 


Ser 


Cys 


Glu 








100 










105 










110 






Gly 


Ser 


Leu 


Arg 


Arg 


Leu 


Gly 


Val 


Asp 


His 


Val 


Asp 


Val 


Leu 


His 


Leu 






115 










120 










125 








Pro 


Arg 


Val 


Asp 


Arg 


Val 


Glu 


Pro 


Trp 


Asp 


Glu 


Val 


Trp 


Gin 


Ala 


Val 




130 










135 










140 










Asp 


Ala 


Leu 


Val 


Ala 


Ala 


Gly 


Lys 


Val 


Cys 


Tyr 


Val 


Gly 


Ser 


Ser 


Gly 


145 










150 










155 










160 


Phe 


Pro 


Gly 


Trp 


His 


He 


Val 


Ala 


Ala 


Gin 


Glu 


His 


Ala 


Val 


Arg 


Arg 










165 










170 










175 




His 


Arg 


Leu 


Gly 


Leu 


Val 


Ser 


His 


Gin 


Cys 


Arg 


Tyr 


Asp 


Leu 


Thr 


Ser 








180 










185 










190 






Arg 


His 


Pro 


Glu 


Leu 


Glu 


Val 


Leu 


Pro 


Ala 


Ala 


Gin 


Ala 


Tyr 


Gly 


Leu 






195 










200 










205 








Gly 


Val 


Phe 


Ala 


Arg 


Pro 


Thr 


Arg 


Leu 


Gly 


Gly 


Leu 


Leu 


Gly 


Gly 


Asp 




210 










215 










220 










Gly 


Pro 


Gly 


Ala 


Ala 


Ala 


Ala 


Arg 


Ala 


Ser 


Gly 


Glh 


Pro 


Thr 


Ala 


Leu 


225 










230 










235 










240 


Arg 


Ser 


Ala 


Val 


Glu 


Ala 


Tyr 


Glu 


Val 


Phe 


Cys 


Arg 


Asp 


Leu 


Gly 


Glu 










245 










250 










255 




His 


Pro 


Ala 


Glu 


Val 


Ala 


Leu 


Ala 


Trp 


Val 


Leu 


Ser 


Arg 


Pro 


Gly 


Val 








260 










265 










270 






Ala 


Gly 


Ala 


Val 


Val 


Gly 


Ala 


Arg 


Thr 


Pro 


Gly 


Arg 


Leu 


Asp 


Ser 


Ala 






275 










280 










285 








Leu 


Arg 


Ala 


Cys 


Gly 


Val 


Ala 


Leu 


Gly 


Ala 


Thr 


Glu 


Leu 


Thr 


Ala 


Leu 




290 










295 










300 










Asp 


Gly 


He 


Phe 


Pro 


Gly 


Val 


Ala 


Ala 


Ala 


Gly 


Ala 


Ala 


Pro 


Glu 


Ala 


305 










310 










315 










320 


Trp 


Leu 


Arg 





























<210> 19 
<211> 247 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 19 

Met Asn Thr Trp Leu Arg Arg Phe Gly Ser Ala Asp Gly His Arg Ala 
15 10 15 
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Arg 


Leu 


Tyr 


Cys 


Phe 


Pro 


ft* 

His 


Ala 


Gly 


Ala 


Ala 


Ala 


Asp 


Ser 


Tyr 


Leu 








20 










25 










30 






Asp 


Leu 


Ala 


Arg 


Ala 


Leu 


Ala 


Pro 


Glu 


Val 


Asp 


Val 


Trp 


Ala 


Val 


Gin 






35 










40 










45 








Tyr 


Pro 


Gly 


Arg 


Gin 


Asp 


Arg 


Arg 


Asp 


Glu 


Arg 


Ala 


Leu 


Gly 


Thr 


Ala 




50 










55 










60 










Gly 


Glu 


lie 


Ala 


Asp 


Glu 


Val 


Ala 


Ala 


Val 


Leu 


Arg 


Asp 


Leu 


Val 


Gly 


65 










70 










75 










80 


Glu 


Val 


Pro 


Phe 


Ala 


Leu 


Phe 


Gly 


His 


Ser 


Met 


Gly 


Ala 


Leu 


Val 


Ala 










85 










90 










95 




Tyr 


Glu 


Thr 


^ _ 

Ala 


Arg 


Arg 


Leu 


Glu 


Ala 


Arg 


Pro 


Gly 


Val 


Arg 


Pro 


Leu 








100 










105 










110 






Arg 


Leu 


Phe 


Val 


Ser 


Gly 


Gin 


Thr 


Ala 


Pro 


Arg 


Val 


His 


Glu 


Arg 


Arg 






115 










120 










125 








Thr 


Asp 


Leu 


Pro 


Asp 


Glu 


Asp 


Gly 


Leu 


Val 


Glu 


Gin 


Met 


Arg 


Arg 


Leu 




130 










135 










140 










Gly 


Val 


Ser 


Glu 


Ala 


Ala 


Leu 


Ala 


Asp 


Gin 


Gly 


Leu 


Leu 


Asp 


Met 


Ser 


145 










150 










155 










160 


Leu 


Pro 


Val 


Leu 


Arg 


Ala 


Asp 


His 


Arg 


Val 


Leu 


Arg 


Ser 


Tyr 


Ala 


Trp 










165 










170 










175 




Gin 


Ala 


Gly 


Pro 


Pro 


Leu 


Arg 


Ala 


Gly 


lie 


Thr 


Thr 


Leu 


Cys 


Gly 


Asp 








180 










185 










190 






Thr 


Asp 


Pro 


Leu 


Thr 


Thr 


Val 


Glu 


Asp 


Ala 


Gin 


Arg 


Trp 


Leu 


Pro 


Tyr 






195 










200 










205 








Ser 


Val 


Val 


Pro 


Gly 


Arg 


Thr 


Arg 


Thr 


Phe 


Pro 


Gly 


Gly 


His 


Phe 


Tyr 




210 










215 










220 








Leu 


Ala 


Asp 


His 


Val 


Gly 


Glu 


Val 


Ala 


Glu 


Ser 


Val 


Ala 


Pro 


Asp 


Leu 


225 










230 










235 








240 


Leu 


Arg 


Leu 


Thr 


Pro 


Thr 


Gly 
















* 





245 



<210> 20 
<211> 189 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 20 



He 


Arg 


Val 


Gin 


Asp 


Asp 


Asp 


Ala 


Asp 


Arg 


Leu 


Ser 


Arg 


Asp 


Glu 


Leu 


1 








5 










10 










15 




Thr 


Ser 


He 


Ala 


Leu 


Val 


Leu 


Leu 


Leu 


Ala 


Gly 


Phe 


Glu 


Ala 


Ser 


Val 








20 










25 










30 






Ser 


Leu 


He 


Gly 


He 


Gly 


Thr 


Tyr 


Leu 


Leu 


Leu 


Thr 


His 


Pro 


Asp 


Gin 






35 










40 










45 






Leu 


Ala 


Leu 


Val 


Arg 


Lys 


Asp 


Pro 


Ala 


Leu 


Leu 


Pro 


Gly 


Ala 


Val 


Glu 




50 










55 










60 










Glu 


He 


Leu 


Arg 


Tyr 


Gin 


Ala 


Pro 


Pro 


Glu 


Thr 


Thr 


Thr 


Arg 


Phe 


Ala 


65 










70 










75 










80 


Thr 


Ala 


Glu 


Val 


Glu 


He 


Gly 


Gly 


Val 


Thr 


He 


Pro 


Ala 


Tyr 


Ser 


Thr 










85 










90 










95 




Val 


Leu 


He 


Ala 


Asn 


Gly 


Ala 


Ala 


Asn 


Arg 


Asp 


Pro 


Gly 


Gin 


Phe 


Pro 








100 










105 










110 






Asp 


Pro 


Asp 


Arg 


Phe 


Asp 


Val 


Thr 


Arg 


Asp 


Ser 


Arg 


Gly 


His 


Leu 


Thr 






115 










120 










125 








Phe 


Gly 


His 


Gly 


He 


His 


Tyr 


Cys 


Met 


Gly 


Arg 


Pro 


Leu 


Ala 


Lys 


Leu 




130 










135 










140 








Glu 


Gly 


Glu 


Val 


Ala 


Leu 


Gly 


Ala 


Leu 


Phe 


Asp 


Arg 


Phe 


Pro 


Lys 


Leu 


145 










150 










155 










160 


Ser 


Leu 


Gly 


Phe 


Pro 


Ser 


Asp 


Glu 


Val 


Val 


Trp 


Arg 


Arg 


Ser 


Leu 


Leu 










165 










170 
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Arg 


Gly 
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Asp 


His 


Leu 


Pro 


Val 


Arg 


Pro 


Asn 


Gly 
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<210> 21 
<211> 33 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Synthetic nucleotide DNA duplex 
<400> 21 

taagaattcg gagatctggc ctcagctcta gac 33 



<210> 22 

<211> 39 

<212> DNA 

<213> Artificial 



Sequence 



<220> 

<223> Complementary oligo 
<400> 22 

aattgtctag agctgaggcc agatctccga attcttaat 39 



<210> 23 
<211> 528 
<212> DNA 

<213> Micromonospora megalomicea 



<400> 23 

ttgcagcggt tgtcggtggc ggtgcgggag gggcgtcggg tgttgggtgt ggtggtgggt 60 

tcggcggtga atcaggatgg ggcgagtaat gggttggcgg cgccgtcggg ggtggcgcag 120 

cagcgggtga 'ttcggcgggc gtggggtcgt gcgggtgtgt cgggtgggga tgtgggtgtg 180 

gtggaggcgc atgggacggg gacgcggttg ggggatccgg tggagttggg ggcgttgttg 240 

gggacgtatg gggtgggtcg gggtggggtg ggtccggtgg tggtgggttc ggtgaaggcg 300 

aatgtgggtc atgtgcaggc ggcggcgggt gtggtgggtg tgatcaaggt ggtgttgggg 360 

ttgggtcggg ggttggtggg tccgatggtg tgtcggggtg ggttgtcggg gttggtggat 420 

tggtcgtcgg gtgggttggt ggtggcggat ggggtgcggg ggtggccggt gggtgtggat 480 

ggggtgcgtc ggggtggggt gtcggcgttt ggggtgtcgg ggacgaat 528 



<210> 24 
<211> 528 
<212> DNA 

<213> Micromonospora megalomicea 



<400> 24 

ctgcagcggt tgtcggtggc ggtgcgggag gggcgtcggg tgttgggtgt ggtggtgggt 60 

tcggcggtga atcaggatgg ggcgagtaat gggttggcgg cgccgtcggg ggtggcgcag 120 

cagcgggtga ttcggcgggc gtggggtcgt gcgggtgtgt cgggtgggga tgtgggtgtg 180 

gtggaggcgc atgggacggg gacgcggttg ggggatccgg tggagttggg ggcgttgttg 240 

gggacgtatg gggtgggtcg gggtggggtg ggtccggtgg tggtgggttc ggtgaaggcg 300 

aatgtgggtc atgtgcaggc ggcggcgggt gtggtgggtg tgatcaaggt ggtgttgggg 360 

ttgggtcggg ggttggtggg tccgatggtg tgtcggggtg ggttgtcggg gttggtggat 4 20 

tggtcgtcgg gtgggttggt ggtggcggat ggggtgcggg ggtggccggt gggtgtggat 4 80 

ggggtgcgtc ggggtggggt gtcggcgttt ggggtgtcgg ggacgaat 528 



<210> 25 
<211> 528 
<212> DNA 

<213> Micromonospora megalomicea 



<220> 
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<221> misc_f eature 
<222> (1) . . . (528) 

<223> Sequence with codon changes as described in the 

specification at page 99, line 22 thru 101, line 23 



<400> 25 

ctgcagcgcc 

tcggccgtca 

cagcgcgtca 

gtcgaggccc 

ggcacgtacg 

aacgtcggcc 

ctcggccgcg 

tggtcgtccg 

ggcgtccgcc 



tctccgtcgc 
accaagacgg 
tacgccgcgc 
acggcaccgg 
gcgtcggccg 
acgtccaggc 
ggctggtcgg 
gcggcctggt 
ggggcggcgt 



cgtccgcgag 
cgcgtcaaac 
gtggggacgc 
cacccgcctc 
cggcggcgtc 
cgcggccggc 
cccgatggtc 
cgtcgcggac 
ctcggcgttc 



ggccgccgag 
ggcct cgccg 
gccggagtat 
ggggatcccg 
ggcccggtcg 
gtcgtcgggg 
tgccgcggcg 
ggggtccgcg 
ggcgtcagcg 



tcctcggcgt 
cgccctccgg 
cgggcggcga 
tcgagctggg 
tcgtcggcag 
tcatcaaggt 
gcctcagcgg 
gctggccggt 
ggacgaat 



cgtcgtcggc 
cgtcgcccag 
cgtcggagtc 
cgccctcctg 
cgtcaaggcc 
cgtcctcggc 
cctcgtcgac 
cggcgtcgac 



<210> 26 
<211> 291 
<212> DNA 

<213> Micromonospora megalomicea 
<400> 26 

ggtggagtgt gatgcggtgg tgtcgtcggt ggtggggttt 
gggtcggtcg ggtgcgccgt cgttggatcg ggtggatgtg 
ggtgatggtg tcgttggcgc ggttgtggcg gtggtgtggg gttgtgcctg cggcggtggt 



tcggtgttgg gggtgttgga 
gtgcagccgg tgttgttcgt 



ggggtgttgt cggtgggtga 
gcgttggccg g 



gggtcattcg cagggggaga tcgcgqcggc ggtggtggcg 
tggtgcgcgg gtggtggcgt tgcgggcgcg ggcgttgcgg 

<210> 27 
<211> 291 
<212> DNA 

<213> Micromonospora megalomicea 
<400> 27 

ggtggagtgt gatgcggtgg tgtcgtcggt ggtggggttt 
gggtcggtcg ggtgcgccgt cgttggatcg ggtggatgtg 
ggtgatggtg tcgttggcgc ggttgtgg-g gtggtgtggg 
gggtcattcg cagggggaga tcgcggcggc ggtggtggcg 
tggtgcgcgg gtggtggcgt tgcgggcgcg ggcgttgcgg 

<210> 28 
<211> 291 
<212> DNA 

<213> Micromonospora megalomicea 
<220> 

<221> misc_feature 
<222> (1) . . . (291) 

<223> Sequence with codon changes as described in the 

specification at page 99, line 22 thru page 101, line 23 



tcggtgttgg gggtgttgga 
gtgcagccgg tgttgttcgt 
gttgtgcctg cggcggtggt 
ggggtgttgt cggtgggtga 
gcgttggccg g 



<400> 28 

cgtggagtgc 

gggccgcagc 

ggtcatggtc 

cggccacagc 

cggcgcccgc 

<210> 29 
<211> 24 
<212> DNA 



gatgcggtcg tgtcgagcgt cgtcggcttc agcgtgctgg gcgtcctgga 
ggcgccccga gcctggaccg cgtcgacgtg gtccagccgg tcctgttcgt 
agcctggccc gcctgtggcg ctggtgcggc gtggtcccgg ccgccgtggt 
cagggcgaga tcgccgccgc ggtcgtggcc ggcgtcctga gcgtcggcga 
gtcgtggccc tgcgcgcccg cgccctgcgc gccctggccg g 



60 
120 
180 
240 
300 
360 
420 
480 
528 



60 
120 
180 
240 
291 



60 
120 
180 
240 
291 



60 
120 
180 
240 
291 
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<213> Artificial Sequence 
<220> 

<223> PCR primer 
<400> 29 

gaacaactcc tgtctgcggc cgcg 24 

<210> 30 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> PCR primer 
<400> 30 

cggaattctc tagagtcacg tctccaaccg cttgtcgagg 4 0 

<210> 31 
<211> 51 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> PCR primer 
<400> 31 

tctagactta attaaggagg acacatatga gcgagagcag cggcatgacc g 51 

<210> 32 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> PCR primer 
<400> 32 

aacgcctccc aggagatctc cagca 25 

<210> 33 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Oligo 
<400> 33 

aattcatagc ctaggt 1* 

<210> 34 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Oligo 
<400> 34 
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Title 

Recombinant Mcgalomicin Biosynthetic Genes And Uses Thereof 

Cross-Reference to Priority Application 
This application claims priority to provisional U.S. patent application 
Serial No. 60/158,305, filed 8 October 1999, and provisional U.S. patent 
application Serial No. 60/190,024, filed 17 March 2000 under 35 U.S.C. § 1 19(e). 
The content of the above referenced applications is incorporated herein by 
reference in its entirety. 



Field of the Invention 
The present invention provides recombinant methods and materials for 
producing polyketides by recombinant DNA technology. The invention relates to 
1 5 the fields of agriculture, animal husbandry, chemistry, medicinal chemistry, 
medicine, molecular biology, pharmacology, and veterinary technology. 



Background of the Invention 
Polyketides represent a large family of diverse compounds synthesized 

20 from 2-carbon units through a series of condensations and subsequent 

modifications. Polyketides occur in many types of organisms, including fungi and 
mycelial bacteria, in particular, the actinomycetes. There are a wide variety of 
polyketide structures, and the class of polyketides encompasses numerous 
compounds with diverse activities. Erythromycin, FK-506, FK-520, megalomicin, 

25 narbomycin, oleandomycin, picromycin, rapamycin, spinocyn, and tylosin are 
examples of such compounds. Given the difficulty in producing polyketide 
compounds by traditional chemical methodology, and the typically low production 
of polyketides in wild-type cells, there has been considerable interest in finding 
improved or alternate means to produce polyketide compounds. See PCT 

30 publication Nos. WO 93/1 3663; WO 95/08548; WO 96/40968; WO 97/02358; 
and WO 98/27203; United States Patent Nos. 4,874,748; 5,063,155; 5,098,837; 
5,149,639; 5,672,49 1 ; and 5,712,146; Fu et aL 1994, Biochemistry 33: 9321- 
9326; McDaniel ei aL, 1993, Science 262: 1546-1550; and Rohr, 1 995, Angew. 
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Chem, Int. Ed. Engl 3¥(8): 881-888, each of which is incorporated herein by 
reference. 

Polyketides are synthesized in nature by polyketide synthase (PKS) 
enzymes. These enzymes, which are complexes of multiple large proteins, are 
5 similar to the synthases that catalyze condensation of 2-carbon units in the 

biosynthesis of fatty acids. PKS enzymes are encoded by PKS genes that usually 
consist of three or more open reading frames (ORFs). Two major types of PKS 
enzymes are known; these differ in their composition and mode of synthesis. 
These two major types of PKS enzymes are commonly referred to as Type I or 
10 "modular" and Type II "iterative" PKS enzymes. 

Modular PKSs are responsible for producing a large number of 12-, 14-, 
and 16-membered macrolide antibiotics including erythromycin, megalomicin, 
methymycin, narbomycin, oleandomycin, picromycin, and tylosin. Each ORF of a 
modular PKS can comprise one, two, or more "modules 51 of ketosynthase activity, 
15 each module of which consists o f at least two (if a loading module) and more 

typically three (for the simplest extender module) or more enzymatic activities or 
"domains." These large multifunctional enzymes (>300,000 kDa) catalyze the 
biosynthesis of polyketide macrolactones through multistep pathways involving 
decarboxylative condensations between acyl thioesters followed by cycles of 
20 varying B-carbon processing activities (see O'Hagan, D. The polyketide 

metabolites; E. Horwood: New York, 1991, incorporated herein by reference). 

During the past half decade, the study of modular PKS function and 
specificity has been greatly facilitated by the plasmid-based Streptomyces 
coelicolor expression system developed with the 6-deoxyerythronolide B (6-dEB) 
25 synthase (DEBS) genes (see Kao et aL, 1 994, Science, 265: 509-5 12, McDaniel el 
aL y 1993, Science 262: 1546-1S57, and U.S. Patent Nos. 5,672,491 and 
5,712,146, each of which is incorporated herein by reference). The advantages to 
this plasmid-based genetic system for DEBS are that it overcomes the tedious and 
limited techniques for manipulating the natural DEBS host organism, 
30 Saccharopolyspora erythraea, allows more facile construction of recombinant 
PKSs, and reduces the complexity of PKS analysis by providing a "clean" host 
background. This system also expedited construction of the first combinatorial 
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modular polyketide library in Streptomyces (see PCT publication No. WO 
98/493 1 5, incorporated herein by reference). 

The ability to control aspects of polyketide biosynthesis, sucH as monomer 
selection and degree of ft-carbon processing, by genetic manipulation of PKSs has 

5 stimulated great interest in the combinatorial engineering of novel antibiotics (see 
Hutchinson, 1998, Curr. Opin Microbiol /: 319-329; Carreras and Santi, 1998, 
Curr. Opin. Biotech. 9: 403-41 1; and U.S. Patent Nos. 5,712,146 and 5,672,491, 
each of which is incorporated herein by reference). This interest has resulted in the 
cloning, analysis, and manipulation by recombinant DNA technology of genes that 

10 encode PKS enzymes. The resulting technology allows one to manipulate a known 
PKS gene cluster either to produce the polyketide synthesized by that PKS at 
higher levels than occur in nature or in hosts that otherwise do not produce the 
polyketide. The technology also allows one to produce molecules that are 
structurally related to, but distinct from, the polyketides produced from known 

1 5 PKS gene clusters. 

Megalomicin is a macrolide antibiotic produced by Micromonospora 
megalomicea, a member of the Actinomycetales family of soil bacteria that 
produces many types of biologically active compounds. Megalomicin is a 
glycoside of erythromycin A, a widely used antibacterial drug with little or no 

20 antimalarial activity. Megalomicin has antibacterial properties similar to those of 
erythromycin, and in 1998, it was discovered also to have potent antiparasitic 
activity and low toxicity. The antiparasitic activity may be related to the effect 
megalomicin has on protein trafficking in eukaryotes, where it appears to inhibit 
vesicular transport between the medial and trans-Golgi, resulting in under- 

25 sialylation of proteins. Hence, megalomicin offers an exciting opportunity to 
develop a new class of antiparasitic drugs with a different mechanism of action 
than the drugs currently in use and, therefore, possibly active against drug-resistant 
forms of Plasmodium falciparum. 

The number and diversity of megalomicin derivatives have been limited 

30 due to the inability to manipulate the PKS genes, which have not previously been 
available in recombinant form. Genetic systems that allow rapid engineering of the 
megalomicin biosynthetic genes would be valuable for creating novel compounds 
for pharmaceutical, agricultural, and veterinary' applications. The production of 

-» 
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such compounds could be more readily accomplished if the heterologous 
expression of the megalomicin biosynthetic genes in Streplomyces coelicolor and 
£ lividans and other host cells were possible. The present invention meets these 
and other needs. 

5 

Summary of the Invention 
The present invention provides recombinant methods and materials for 
expressing PKS enzymes and polyketide modification enzymes derived in whole 
and in part from the megalomicin biosynthetic genes in recombinant host cells. 

10 The invention also provides the polyketides produced by such PKS enzymes. The 
invention provides in recombinant form all of the genes for the proteins that 
constitute the complete PKS that ultimately results, in Micromonospora 
megalomicea, in the production of megalomicin. Thus,, in one embodiment, the 
invention is directed to recombinant materials comprising nucleic acids with 

1 5 nucleotide sequences encoding at least one domain, module, or protein encoded by 
a megalomicin PKS gene. In one preferred embodiment of the invention, the DNA 
compounds of the invention comprise a coding sequence for at least one and 
preferably two or more of the domains of the loading module and extender 
modules 1 through 6, inclusive, of the megalomicin PKS. 

20 In one embodiment, the invention provides a recombinant expression 

vector that comprises a heterologous promoter positioned to drive expression of 
one or more of the megalomicin biosynthetic genes. In a preferred embodiment, 
the promoter is derived from another PKS gene. In a related embodiment, the 
invention provides recombinant host cells comprising one or more expression 

25 vectors that produce(s) megalomicin or a megalomicin derivative or precursor. In 
a preferred embodiment, the host cell is Streplomyces lividans or S. coelicolor. 

In another embodiment, the invention provides a recombinant expression 
vector that comprises a promoter positioned to drive expression of a hybrid PKS 
comprising all or part of the megalomicin PKS and at least a part of a second PKS. 

30 In a related embodiment, the invention provides recombinant host cells 

comprising the vector that produces the hybrid PKS and its corresponding 
polyketide. In a preferred embodiment, the host cell is Streplomyces lividans or S. 
coelicolor. 
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In a related embodiment, the invention provides recombinant materials for 
the production of libraries of polyketides wherein the polyketide members of the 
library are synthesized by hybrid PKS enzymes of the invention. The resulting 
polyketides can be further modified to convert them to other useful compounds, 
5 such as antibiotics, motilides, and antiparasitics, typically through hydroxylation 
and/or glycosylation. Modified macrolides provided by the invention that are 
useful intermediates in the preparation of antiparasitics are of particular benefit. 

In another related embodiment, the invention provides a method to prepare 
a nucleic acid that encodes a modified PKS, which method comprises using the 

10 megalomicin PKS encoding sequence as a scaffold and modifying the portions of 
the nucleotide sequence that encode enzymatic activities, either by mutagenesis, 
inactivation, deletion, insertion, or replacement. The thus modified megalomicin 
PKS encoding nucleotide sequence can then be expressed in a suitable host cell 
and the cell employed to produce a polyketide different from that produced by the 

15 megalomicin PKS. In addition, portions of the megalomicin PKS coding sequence 
can be inserted into other PKS coding sequences to modify the products thereof. 

In another related embodiment, the invention is directed to a multiplicity of 
cell colonies, constituting a library of colonies, wherein each colony of the library 
contains an expression vector for the production of a modular PKS derived in 

20 whole or in part from the megalomicin PKS. Thus, at least a portion of the 

modular PKS is identical to that found in the PKS that produces megalomicin and 
is identifiable as such. The derived portion can be prepared synthetically or 
directly from DNA derived from organisms that produce megalomicin. In 
addition, the invention provides methods to screen the resulting polyketide and 

25 antibiotic libraries. 

The invention also provides novel polyketides, motilides, antibiotics, 
antiparasitics and other useful compounds derived therefrom. The compounds of 
the invention can also be used in the manufacture of another compound. In a 
preferred embodiment, the compounds of the invention are formulated in a 

30 mixture or solution for administration to an animal or human. 

In a specific embodiment, the invention provides an isolated nucleic acid 
fragment comprising a nucleotide sequence encoding a domain of megalomicin 
polyketide synthase (PKS) or a megalomicin modification enzyme. The isolated 
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nucleic acid fragment can be a DNA or a RNA. Preferably, the isolated nucleic 
acid fragment is a recombinant DNA compound. 

The isolated nucleic acid fragment can comprise a single, multiple or all 
the open reading frame(s) (ORP) of the megalomicin PKS or a megalomicin 
5 modification enzyme. Exemplary ORFs of megalomicin PKS include the ORFs of 
the megAI, megAIJ and megAIII genes. The isolated nucleic acid fragment can 
also encode a single, multiple, or all of the domains of the megalomicin PKS. 
Exemplary domains of the megalomicin PKS include a TE domain, a KS domain, 
an AT domain, an ACP domain, a KR domain, a DH domain and an ER domain. 
10 In a preferred embodiment, the nucleic acid fragment encodes a module of the 
megalomicin PKS. In another preferred embodiment, the nucleic acid fragment 
encodes the loading module, a thioesterase domain, and all six extender modules 
of the megalomicin PKS. 

Megalomicin modification enzymes include those enzymes involved in the 
15 conversion of 6-dEB into a megalomicin such as the enzymes encoded by the 
megF, meg BV, megCIIL megK^ megDI and megG (renamed megY) genes. 
Megalomicin modification enzymes also include those enzymes involved in the 
biosynthesis of mycarose, megosamine or desosamine, which are used as 
biosynthetic intermediates in the biosynthesis of various megalomicin species and 
20 other related polyketides. The enzymes that are involved in biosynthesis of 
mycarose, megosamine or desosamine are described in Figures 5 and 10. 

In a preferred embodiment, the invention provides an isolated nucleic acid 
fragment which hybridizes to a nucleic acid having a nucleotide sequence set forth 
in the SEQ. ID NO:l, under low, medium or high stringency. More preferably, the 
25 nucleic acid fragment comprises, consists or consists essentially of a nucleic acid 
having a nucleotide sequence set forth in the SEQ. ID NO: 1 . 

In another specific embodiment, the invention provides a substantially 
purified polypeptide, which is encoded by a nucleic acid fragment comprising a 
nucleotide sequence encoding a domain of megalomicin polyketide synthase 
30 (PKS) or a megalomicin modification enzyme. The polypeptide c#n comprise a 
single domain, multiple domains or a full-length megalomicin PKS or 
megalomicin modification enzyme. Functional fragments, analogs or derivatives 
of the megalomicin PKS or megalomicin modification enzyme polypeptides are 
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also provided. Preferably, such fragments, analogs or derivatives can be 
recognized by an antibody raised against a megalomicin PKS or megalomicin 
modification enzyme. Also preferably, such fragments, analogs or derivatives 
comprise an amino acid sequence that has at least 60% identity, more preferably at 
5 least 90% identity, to their wild type counterparts. 

In still another specific embodiment, the invention provides an antibody, or 
a fragment or derivative thereof, which immuno-specifically binds to a domain of 
megalomicin polyketide synthase (PKS) or a megalomicin modification enzyme. 
The antibody can be a monoclonal or polyclonal antibody or an antibody fragment. 

10 Preferably, the antibody is a monoclonal antibody. 

In yet another specific embodiment, the invention provides a recombinant 
DNA expression vector comprising the recombinant DNA compound encoding at 
least a domain of the megalomicin PKS or a megalomicin modification enzyme, 
wherein said domain is operably linked to a promoter. Preferably, the 

1 5 recombinant DNA expression vector further comprises an origin of replication or a 
segment of DNA that enables chromosomal integration. 

In yet another specific embodiment, the invention provides a recombinant 
host cell comprising the above-described recombinant DNA expression vector 
encoding at least a domain of megalomicin PKS or the megalomicin modification 

20 enzyme. The recombinant host cells can be any suitable host cells including 
animal, mammalian, plant, fungal, yeast, and bacterial cells. Preferably, the 
recombinant host cells are Streptomyces cells, such as Streptomyces lividans and 
S. coelicolor cells, or ccharopolyspora cells, such as Saccharopolyspora erythraea 
cells. Also preferably, the recombinant host cells do not produce megalomicin in 

25 their un transformed, non-recombinant state. 

When the recombinant host cell contains nucleic acid encoding more than 
one megalomicin PKS or megalomicin modification enzyme, or domains thereof, 
such nucleic acid material can be located at a single genetic locus, e.g., on a single 
plasmid or at a single chromosomal locus, or at different genetic loci, e.g., on 

30 separate piasmids and/or chromosomal loci. In one example, the invention 
provides a recombinant host cell, which comprises at least two separate 
autonomously replicating recombinant DNA expression vectors, and each of said 
vectors comprises a recombinant DNA compound encoding a megalomicin PKS 
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domain or a megalomicin modification enzyme operably linked to a promoter. In 
another example, the invention provides a recombinant host cell, which comprises 
at least one autonomously replicating recombinant DNA expression vector and at 
least one modified chromosome, each of said vector(s) and each of said modified 
5 chromosome comprises a recombinant DNA compound encoding a megalomicin 
PKS domain or a megalomicin modification enzyme operably linked to a 
promoter. Preferably, the autonomously replicating recombinant DNA expression 
vector and/or the modified chromosome further comprises distinct selectable 
markers. 

10 In a preferred embodiment, the cell comprises three different vectors, one 

of which is integrated into the chromosome and two of which are autonomously 
replicating, and each of the vectors comprises a meg PKS gene. Optionally, one or 
more of the meg PKS genes contains one or more domain alterations, such as a 
deletion or substitution of a meg PKS domain with a domain from another PKS. 

1 5 In yet another specific embodiment, the invention provides a hybrid PKS, 

which is produced from a recombinant gene that comprises at least a portion of a 
megalomicin PKS gene and at least a portion of a second PKS gene for a 
polyketide other than megalomicin. For example, and without limitation, the 
second PKS gene can be a narbonolide PKS gene, an oleandolide PKS gene, or a 

20 rapamycin PKS gene. Tn one embodiment, the hybrid PKS is composed of a 

loading module and six extender modules, wherein at least one domain of any one 
of extender modules 1 through 6, inclusive, is a domain of an extender module of 
megalomicin PKS. In another preferred embodiment, the hybrid PKS comprises a 
megalomicin PKS that has a non-functional KS domain in module 1. 

25 In yet another specific embodiment, the invention provides a method of 

producing a polyketide, which method comprises growing the recombinant host 
cell comprising a recombinant DNA expression vector encoding at least a domain 
of the megalomicin PKS or a megalomicin modification enzyme under conditions 
whereby the megalomicin PKS domain or the megalomicin modification enzyme 

30 comprised by the recombinant expression vector is produced and the polyketide is 
synthesized by the cell, and recovering the synthesized polyketide. Preferably, the 
recombinant host cell comprises a recombinant expression vector that encodes at 
least a portion of a megAI, megAII, or megAUI gene. 
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These and other embodiments of the invention are described in more detail 
in the following description, the examples, and claims set forth below. 

Brief Description of the Figures 
5 Figure 1 shows restriction site and function maps of the insert DNA in 

cosmids pKOS079-138B, pKOS079-93D ; pKOS079-93A, and pKOS079-124B of 
the invention. Various restriction sites (ATzoI, BglE, Nsil) are also shown. The 
location of the megalomicin biosynthetic genes is shown below the solid lines 
indicating the cosmid inserts. The genes are shown as arrows pointing in the 

10 direction of transcription. The approximate size (in kilobase (kb) pairs) of the gene 
cluster is indicated in 5000 bp (i.e., 5K> 10K, and the like.) increments on a solid 
bar beneath the arrows indicating the genes. 

Figure 2 shows a more detailed map of the megalomicin biosynthetic gene 
cluster. The various open reading frames are shown as arrows pointing in the 

1 5 direction of transcription. A line indicates the size in base pairs (in 1000 bp 

increments) of the gene cluster. The various domains of the megalomicin PKS are 
also shown. Other genes of the megalomicin biosynthetic gene cluster not shown 
in this Figure are located in the insert DNA of cosmids pKOS0138B and 
pKOS0124B. 

20 Figure 3 shows the structures of the megalomicins, azithromycin and 

erythromycin A. 

Figure 4 shows the modules and domains of DEBS and the megalomicin 

PKS. 

Figure 5 shows the compounds and reactions in the erythromycin 
25 biosynthetic pathway and also for megalomicin biosynthesis. Genes that produce 
the various enzymes that catalyze each of the steps in the biosynthetic pathway are 
indicated. 

Figure 6 shows the biosynthetic pathway for the formation of desosamine, 
rhodosamine, and mycarose, as well as the genes that produce the various enzymes 
30 that catalyze each of the steps in the biosynthetic pathway. 

Figure 7 depicts nucleotide and amino acid sequence of Micromonospora 
megaiomicea megalomicin biosynthetic genes (GenBank Accession No. 
AF263245, incorporated herein by reference). 

9 
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Figure 8 depicts the biosynthesis of the erythromycins and megaiomicins 
and the enzymes that mediate the biosynthesis of each. 

Figure 9 depicts the cloned megalomicin biosynthetic gene cluster and 
certain cosmids of the invention that comprise portions of the cluster. 
5 Figure 10 depicts the biosynthesis of megosamine, mycarose, and 

desosamine. 

Detailed Description of the Invention 
The present invention provides useful compounds and methods for 
10 producing polyketides in recombinant host cells. As used herein, the term 
recombinant refers to a compound or composition produced by human 
intervention. The invention provides recombinant DNA compounds encoding all 
or a portion of the megalomicin biosynthetic genes. The invention provides 
recombinant expression vectors useful in producing the megalomicin PKS and 
1 5 hybrid PKSs composed of a portion of the megalomicin PKS in recombinant host 
cells. The invention also provides the polyketides produced by the recombinant 
PKS and polyketide modification enzymes. 

To appreciate the many and diverse benefits and applications of the 
invention, the description of the invention below is organized as follows. In 
20 Section I, common definitions used throughout this application are provided. Tn 
Section II, structural and functional characteristics of megalomicin are described. 
In Section ill, the recombinant megalomicin biosynthetic genes and other 
recombinant nucleic acids provided by the invention are described. In Section IV, 
polypeptides and proteins encoded by the megalomicin biosynthetic genes and 
25 antibodies that specifically bind to such polypeptides and proteins provided by the 
invention are described. In Section V, methods for heterologous expression of the 
megalomicin biosynthetic genes provided by the invention are described. In 
Section VI, the hybrid PKS genes provided by the invention are described. In 
Section VII; host cells containing multiple megalomicin biosynthetic genes and 
30 nucleic acid fragments on separate express vectors provided by the invention are 
described. In Section VIII, the polyketide compounds provided by the invention 
and pharmaceutical compositions of those compounds are described. The detailed 
description is followed by working examples illustrating the invention. 

10 
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Unless defined otherwise, all technical and scientific terms used herein 
have the same meaning as is commonly understood by one of ordinary skill in the 
art to which this invention belongs. All patents, applications, published 
applications and other publications and sequences from GenBank and other data 
5 bases referred to herein are incorporated by reference in their entirety. 

Section I. Definitions 

As used herein, domain refers to a portion of a molecule, e.g., proteins or 
nucleic acids, that is structurally and/or functionally distinct from another portion 
1 0 of the molecule. 

As used herein, antibody includes antibody fragments, such as Fab 
fragments, which are composed of a light chain and the variable region of a heavy 
chain. 

As used herein, biological activity refers to the in vivo activities of a 
1 5 compound or physiological responses that result upon in vivo administration of a 

compound, composition or other mixture. Biological activity, thus, encompasses 

therapeutic effects and pharmaceutical activity of such compounds, compositions 

and mixtures. Biological activities may be observed in in vitro systems designed 

to test or use such activities. 
-0 As used herein, a combination refers to any association between two or 

among more items. 

As used herein, a composition refers to any mixture. It may be a solution, 

a suspension, liquid, powder, a paste, aqueous, non-aqueous or any combination 

thereof 

25 As used herein, derivative or analog of a molecule refers to a portion 

derived from or a modified version of the molecule. 

As used herein, operably linked, operatively linked or operationally 
associated refers to the functional relationship of DNA with regulatory and 
effector sequences of nucleotides, such as promoters, enhancers, transcriptional 

30 and translational stop sites, and other signal sequences. For example, operative 
linkage of DNA to a promoter refers to the physical and functional relationship 
between the DNA and the promoter such that the transcription of such DNA is 
initiated from the promoter by an RNA polymerase that specifically recognizes, 
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binds to and transcribes the DNA. To optimize expression and/or in vitro 
transcription, it may be helpful to remove, add or alter 5' untranslated portions of 
the clones to eliminate extra, potentially inappropriate alternative translation 
initiation (i.e., start) codons or other sequences that may interfere with or reduce 
5 expression, either at the level of transcription or translation. Alternatively, 

consensus ribosome binding sites (see, e.g., Kozak, J. BioL Chem^ 266:19867- 
19870 (1991)) can be inserted immediately 5* of the start codon and may enhance 
expression. The desirability of (or need for) such modification may be empirically 
determined. 

10 As used herein, pharmaceutical^ acceptable salts, esters or other 

derivatives of the conjugates include any salts, esters or derivatives that may be 
readily prepared by those of skill in this art using known methods for such 
derivatization and that produce compounds that may be administered to animals or 
humans without substantial toxic effects and that either are pharmaceutical ly 

1 5 active or are prodrugs. 

As used herein, a promoter region or promoter element refers to a segment 
of DNA or RNA that controls transcription of the DNA or RNA to which it is 
operatively linked. The promoter region includes specific sequences that are 
sufficient for RNA polymerase recognition, binding and transcription initiation. 

20 This portion of the promoter region is referred to as the promoter. In addition, the 
promoter region includes sequences that modulate this recognition, binding and 
transcription initiation activity of RNA polymerase. These sequences may be cis 
acting or may be responsive to trans acting factors. Promoters, depending upon 
the nature of the regulation, may be constitutive or regulated. 

25 As used herein: stringency of hybridization in determining percentage 

mismatch is as follows: (I) high stringency: 0.1 x SSPE, 0.1% SDS, 65°C; (2) 
medium stringency: 0.2 x SSPE, 0.1% SDS, 50°C; and (3) low stringency: 1.0 x 
SSPE, 0.1% SDS, 50°C. Equivalent stringencies may be achieved using alternative 
buffers, salts and temperatures. 
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The term substantially identical or homologous or similar varies with the 
context as understood by those skilled in the relevant art and generally means at 
least 70%, preferably means at least 80%, more preferably at least 90%, and most 
preferably at least 95% identity. 
5 As used herein, substantially identical to a product means sufficiently 

similar so that the property of interest is sufficiently unchanged so that the 
substantially identical product can be used in place of the product. 

As used herein, isolated means that a substance is either present in a 
preparation at a concentration higher than that substance is found in nature or in its 

1 0 naturally occurring state or that the substance is present in a preparation that 

contains other materials with which the substance is not associated with in nature. 
As an example of the latter, an isolated meg PKS protein includes a meg PKS 
protein expressed in a Strcptomyces coelicolor or S. lividam host cell. 

As used herein, substantially pure means sufficiently homogeneous to 

15 appear free of readily detectable impurities as determined by standard methods of 
analysis, such as thin layer chromatography (TLC), gel electrophoresis and high 
performance liquid cliromatography (HPLC), used by those of skill in the art to 
assess such purity, or sufficiently pure such that further purification would not 
detectably alter the physical and chemical properties, such as enzymatic and 

20 biological activities, of the substance. Methods for purification of the compounds 
to produce substantially chemically pure compounds are known to those of skill in 
the art. A substantially chemically pure compound may, however, be a mixture of 
stereoisomers or isomers. In such instances, further purification might increase 
the specific activity of the compound. 
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As used herein, vector or plasmid refers to discrete elements that are used 
to introduce heterologous DNA into cells for either expression or replication 
thereof- Selection and use of such vehicles are well known within the skill of the 
artisan. An expression vector includes vectors capable of expressing DMAs that 

5 are operatively linked with regulatory sequences, such as promoter regions, that 
are capable of effecting expression of such DNA fragments. Thus, an expression 
vector refers to a recombinant DNA or RN A construct, such as a plasmid, a phage, 
recombinant virus or other vector that, upon introduction into an appropriate host 
cell, results in expression of the cloned DNA. Appropriate expression vectors are 

10 well known to those of skill in the art and include those that are replicable in 

eukaryotic cells and/or prokaryotic cells and those that remain episomal or those 
which integrate into the host cell genome. 

Section II. Megalomicins 
15 The megalomicins were discovered in 1969 at Schering Corp. as 

antibacterial agents produced by Micromonospora megalomicea (see Weinstein el 
al. 9 1969, J. Antibiotics 22: 253-258, and U.S. Patent No. 3,632,750, both of 
which are incorporated herein by reference). Although the initial structural 
assignment was in error, a thorough reassessment of NMR data coupled with an 
20 X-ray crystal structure of a. megalomicin A derivative (see Nakagawa and Omura, 
"Structure and Stereochemistry of Macrolides" in Macrolide Antibiotics (S. 
Omura, ed.), Academic Press, NY, 1 984, incorporated herein by reference) 
established the structures shown in Figure 3. The megalomicins are 6-0- 
glycosides of erythromycin C with acetyl or propionyl groups esterified at the 3*" 
25 or 4"' hydroxy Is of the mycarose sugar at the C-3-position. The C-6 sugar has 

been named "megosamine," although it had been identified 5 to 10 years earlier as 
L-rhodosamine or Af-dimethyldaunosarnine, deoxyamino sugars commonly present 
in the anthracycline antitumor drugs. The antibacterial potency, spectrum of 
activity, and toxicity (LD 50 acute, 7-7.5 g/kg s.c. or oral; subacute, >500 mg/kg) of 
30 the megalomicins is similar to that of erythromycin A. 

The megalomicins have two modes of biological activity. As antibacterials, 
they act like the erythromycins, which inhibit protein synthesis at the translocation 
step by selective binding to the bacterial SOS ribosomal RNA. They also affect 
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protein trafficking in eukaryotic cells (see Bonay et aL, 1996, J. Biol. Chem. 
277:3719-3726, incorporated herein by reference). Although the mechanism of 
action is not entirely clear, it appears to involve inhibition of vesicular transport 
between the medial and trans Golgi, resulting in under-sialylation of proteins. The 

5 megalomicins also strongly inhibit the A TP-dependent acidification of lysosomes 
in vivo (see Bonay et aL 7 1997, J. Cell. Sci. J J O A 839-1 849, incorporated herein by 
reference) and cause an anomalous glycosylation of viral proteins, which may be 
responsible for their antiviral activity against herpes (Toxso, 70-100 \xM; see 
Alarcon et aL, 1984, Antivir. Res. 4:231-243, and Alarcon et aL, 1988, FEBS Lett. 

1 0 2J/:207-21 1, both of which are incorporated herein by reference). 

Strikingly, the megalomicins are potent antiparasitic agents, showing an 
!C 5 o of 1 pg/ml in blocking intracellular replication of Plasmodium falciparum 
infected erythrocytes (see Bonay et aL, 1 998, Antimicrob. Agents Chemother. 
42:2668-2673, incorporated herein by reference). The megalomicins are effective 

1 5 against Trypanosoma cruzi and T. brucei (IC50, 0.2-2 ^ig/ml) plus Leishmania 
donovani and L. major promastigotes (IC50, 3 and 8 pg/ml, respectively). 
Megalomicin is also active against the intracellular replicative, amastigote form of 
T. crazi, completely preventing its replication in infected murine LLC/MK2 
macrophages at a dose of 5 pg/ml. Importantly, the effective drug concentration is 

20 500-fold less than the acute LD 50 in mammals, and there is no toxicity to BALB/c 
mice at doses (50 mg/kg) that are completely curative for T. brucei infections. 
Because the erythromycins do not have such activity, although azithromycin 
(Figure 3) has been reported to be an effective acute and prophylactic treatment for 
malaria caused by P. vivax and P. falciparum (see Taylor et aL, 1999, Clin. Infect. 

25 Dis. 28:74-81, incorporated herein by reference), the antiparasitic action of the 
megalomicins is unique and probably related to the presence of the deoxyamino 
sugar megosamine at C-6 (Figure 3). Consequently, the megalomicins could be 
developed into potent antimalarial drugs with a high "therapeutic index and be 
active against P. falciparum and other species that are resistant to currently used 

30 classes of antimalarials. They also could lead to potent antiparasitic agents against 
leishmaniasis, trypanosomiasis, and Chagas' disease. In view of the widespread 
use of the erythromycins and their good oral availability plus the low mammalian 
toxicity of macrolides in general, the megalomicins could be used prophylactically 
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to combat malaria, and as fermentation products, the megalomicins should be 
relatively inexpensive to produce. . 

The megalomicins belong to the polyketide class of natural products whose 
members have diverse structural and pharmacological properties (see Monaghan 
5 and Tkacz, 1990, Annu. Rev, Microbiol. 44: 27 1, incorporated herein by 

reference). The megalomicins are assembled by polyketide synthases through 
successive condensations of activated coenzyme- A thioester monomers derived 
from small organic acids such as acetate, propionate, and butyrate. Active sites 
required for condensation include an acyltransferase (AT), acyl carrier protein 
1 0 (ACP), and beta-ketoacylsynthase (KS). Each condensation cycle results in a 6- 
keto group that undergoes all, some, or none of a series of processing activities. 
Active sites that perform these reactions include a ketoreductase (KR), 
dehydratase (DH), and enoylreductase (ER). Thus, the absence of any beta-keto 
processing domain results in the presence of a ketone, a KR alone gives rise to a 
1 5 hydroxyl, a KR and DH result in an alkene, while a KR, DH, and ER combination 
leads to complete reduction to an alkane. After assembly of the polyketide chain, 
the molecule typically undergoes cyclization(s) and post-PKS modification (e.g. 
glycosylation, oxidation, acylation) to achieve the final active compound. 

Macrolides such as erythromycin and megalomicin are synthesized by 
20 modular PKSs (see Cane ci al, 1998, Science 282: 63, incorporated herein by 
reference). For illustrative purposes, the PKS that produces the erythromycin 
polyketide (6-deoxyerythronolide B synthase or DEBS; see U.S. Patent No. 
5,824,51 3. incorporated herein by reference) is shown in Figure 4. DEBS is the 
most characterized and extensively used modular PKS system. DEBS is 
25 particularly relevant to the present invention in that it synthesizes the same 

polyketide, 6-deoxyerythronolide B (6-dEB), synthesized by the megalomicin 
PKS. In modular PKS enzymes such as DEBS and the megalomicin PKS, the 
enzymatic steps for each round of condensation and reduction are encoded within 
a single "module" of the polypeptide (i.e., one distinct module for every 
30 condensation cycle). DEBS consists of a loading module and 6 extender modules 
and a chain terminating thioesterase (TE) domain within three extremely large 
polypeptides encoded by three open reading frames (ORFs, designated eryAI, 
eryAIJy and eryAIII). 
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Each of the three polypeptide subunits of DEBS (DEBSL DEBSII, and 
DEBSIII) contains 2 extender modules, DEBSI additionally contains the loading 
module. Collectively, these proteins catalyze the condensation and appropriate 
reduction of 1 propionyl CoA starter unit and 6 methylmalonyl CoA extender 
5 units. Modules 1 , 2, 5, and 6 contain KR domains; module 4 contains a complete 
set, KR/DH/ER, of reductive and dehydratase domains; and module 3 contains no 
functional reductive domain. Following the condensation and appropriate 
dehydration and reduction reactions, the enzyme bound intermediate is lactonized 
by the TE at the end of extender module 6 to form 6-dEB. 

10 More particularly, the loading module of DEBS consists of two domains, 

an acyl-transferase (AT) domain and an acyl carrier protein (ACP) domain. In 
other PKS enzymes, the loading module is not composed of an AT and an ACP 
but instead utilizes an inactivated KS, an AT, and an ACP. This inactivated KS is 
in most instances called KS°, where the superscript letter is the abbreviation for 

1 5 the amino acid, glutamine, that is present instead of the active site cysteine 
required for activity. The AT domain of the loading module recognizes a 
particular acyl-CoA (propionyl for DEBS, which can also accept acetyl) and 
transfers it as a thiol ester to the ACP of the loading module. Concurrently, the AT 
on each of the extender modules recognizes a particular extender-Co A 

20 (methylmalonyl for DEBS) and transfers it to the ACP of that module to form a 

thioester. Once the PKS is primed with acyl- and malonyl-ACPs, the acyl group of 
the loading module migrates to form a thiol ester (trans-esterification) at the KS of 
the first extender module; at this stage, extender module 1 possesses an acyl-KS 
and a methylmalonyl ACP. The acyl group derived from the loading module is 

25 then covalently attached to the alpha-carbon of the malonyl group to form a 

carbon-carbon bond, driven by concomitant decarboxylation, and generating a new 
acyl -ACP that has a backbone two carbons longer than the loading unit 
(elongation or extension). The growing polyketide chain is transferred from the 
ACP to the KS of the next module, and the process continues. 

30 The polyketide chain, growing by two carbons each module, is sequentially 

passed as a covalently bound thiol ester from module to module, in an assembly 
line-like process. The carbon chain produced by this process alone would possess 
a ketone at every other carbon atom, producing a polyketone, from which the 
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name polyketide arises. Commonly, however, the beta keto group of each two- 
carbon unit is modified just after it has been added to the growing polyketide 
chain but before it is transferred to the next module by either a KR, a KR plus a 
DH, or a KR, a DH, and an ER. As noted above, modules may contain additional 
5 enzymatic activities as well. 

Once a polyketide chain traverses the final extender module of a PKS, it 
encounters the releasing domain or thioesterase found at the carboxyl end of most 
PKSs. Here, the polyketide is cleaved from the enzyme and cyclyzed. The 
resulting polyketide can be modified further by tailoring or modification enzymes; 
1 0 these enzymes add carbohydrate groups or methyl groups, or make other 

modifications, i.e., oxidation or reduction, on the polyketide core molecule. For 
example, the final steps in conversion of 6-dEB to erythromycin A include the 
actions of a number of modification enzymes, such as: C-6 hydroxylation, 
attachment of mycarose and desosamine sugars, C-12 hydroxylation (which 
1 5 produces erythromycin C), and conversion of mycarose to cladinose via O- 
methylation, as shown in Figure 5. 

With this overview of PKS and post-PKS modification enzymes, one can 
better appreciate the recombinant megalomicin biosynthetic genes provided by the 
invention and their function, as described in the following Section. 

20 

Section III: The Megalomicin Biosynthetic Genes and Nucleic Acid Fragments 

The megalomicin PKS was isolated and cloned by the following 
procedure. Genomic DNA was isolated from a megalomicin producing strain of 
Micromonospora megalomicea subsp. nigra (ATCC 27598), partially digested 

25 with a restriction enzyme, and cloned into a commercially available cosmid vector 
to produce a genomic library. This library was then probed with probe generated 
from the erythromycin biosynthetic genes as well as from cosmids identified as 
containing sequences homologous to erythromycin biosynthetic genes. This 
probing identified a set of cosmids, which were analyzed by DNA sequence 

30 analysis and restriction enzyme digestion, which revealed that the desired DNA 
had been isolated and that the entire PKS gene cluster was contained in 
overlapping segments on four of the cosmids identified. Figure 1 shows the 
cosmids, and the portions of the megalomicin biosynthetic gene cluster in the 
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insert DNA of the cosmids. Figure 1 shows that the complete megaiomicin 
biosynthetic gene cluster is contained within the insert DNA of cosmids 
pKOS079-138B, pKOS079-l24B, pKOS079-93D, and pKOS079-93A. Each of 
these cosmids has been deposited with the American Type Culture Collection in 
5 accordance with the terms of the Budapest Treaty (cosrnid pKOS079-138B is 

available under accession no. ATCC ; cosmid pKOS079-124B is available 

under accession no. ATCC ; cosmid pKOS079-93D is available under 

accession no. ATCC ; and cosmid pKOS079-93A is available under 

accession no. ATCC ). Various additional reagents of the invention can be 

1 0 isolated from these cosmids. DNA sequence analysis was also performed on the 
various subclones of the invention, as described herein. Further analysis of these 

* 

cosmids and subclones prepared from the cosmids facilitated the identification of 
the location of various megaiomicin biosynthetic genes, including the ORFs 
encoding the PKS, modules encoded by those ORFs, and coding sequences for 
15 megaiomicin modification enzymes. The location of these genes and modules is 
shown on Figure 2. 

Those of skill in the art will recognize that, due to the degenerate nature of 
the genetic code, a variety of DNA compounds differing in their nucleotide 
sequences can be used to encode a given amino acid sequence of the invention. 

20 The native DNA sequence encoding the megaiomicin PKS and other biosynthetic 
enzymes and other biosynthetic enzymes of Micromonospora megalomicea is 
shown herein merely to illustrate a preferred embodiment of the invention, and the 
invention includes DNA compounds of any sequence that encode the amino acid 
sequences of the polypeptides and proteins of the invention. In similar fashion, a 

25 polypeptide can typically tolerate one or more amino acid substitutions, deletions, 
and insertions in its amino acid sequence without loss or significant loss of a 
desired activity. The present invention includes such polypeptides with alternate 
amino acid sequences, and the amino acid sequences encoded by the DNA 
sequences shown herein merely illustrate preferred embodiments of the invention. 

30 The recombinant nucleic acids, proteins, and peptides of the invention are 

many and diverse. To facilitate an understanding of the invention and the diverse 
compounds and methods provided thereby, the following description of the 
various regions of the megaiomicin PKS and the megaiomicin modification 
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enzymes and corresponding coding sequences is provided. To facilitate description 
of the invention, reference to a PKS, protein, module, or domain herein can also 
refer to DNA compounds comprising coding sequences therefor and vice versa. 
Also, unless otherwise indicated, reference to a heterologous PKS refers to a PKS 
5 or DNA compounds comprising coding sequences therefor from an organism 
other than Micromonospora megalomicea. In addition, reference to a PKS or its 
coding sequence includes reference to any portion thereof. 

Thus, the invention provides DNA molecules in isolated (i.e., not pure, but 
existing in a preparation in an abundance and/or concentration not found in nature) 
10 and purified (i.e., substantially free of contaminating materials or substantially free 
of materials with which the corresponding DNA would be found in nature) form. 
The DNA molecules of the invention comprise one or more sequences that encode 
one or more domains (or fragments of such domains) of one or more modules in 
one or more of the ORFs of the megalomicin PKS and sequences that encode 
1 5 megalomicin modification enzymes from the megalomicin biosynthetic gene 

cluster. Examples of PKS domains include the KS, AT, DH, KR, ER, ACP, and 
TE domains of at least one of the 6 extender modules and loading module of the 
three proteins encoded by the three ORFs of the megalomicin PKS gene cluster. 
Examples of megalomicin modification enzymes include those that synthesize the 
20 mycarose, desosamine, and megosamine moieties, those that transfer those sugar 
moieties to the polyketide 6-dEB, those that hydroxylate the polyketide at C-6 and 
C-12, and those that acylate the sugar moieties. 

In an especially preferred embodiment, the DNA molecule is a 
recombinant DNA expression vector or plasmid, as described in more detail in the 
25 following Section. Generally, such vectors can either replicate in the cytoplasm of 
the host cell or integrate into the chromosomal DNA of the host cell. In either 
case, the vector can be a stable vector (i.e., the vector remains present over many 
cell divisions, even if only with selective pressure) or a transient vector (i.e., the 
vector is gradually lost by host cells with increasing numbers of cell divisions). 
30 The megalomicin PKS gene cluster comprises three ORFs (megAJ, megAII, 

and megAIII). Each ORF encodes two extender modules of the PKS; the first ORF 
also encodes the loading module. Each extender module is composed of at least a 
KS, an AT, and an ACP domain. The locations of the various encoding regions of 
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these ORPs are shown in Figure 2 and described with reference to the sequence 
information below. The megalomicin PKS produces the polyketide known as 6- 
dEB, shown in Figure 4. [n megalomicin-producing organisms, 6-dEB is 
converted to erythromycin C by a set of modification enzymes. Thus, 6-dEB is 
5 converted to erythronolide B by the megF gene product (a homolog of the eryF 
gene product), then to 3-alpha-mycarosyl-erythronolide B by the megBV gene 
product (a homolog of the eryBV gene product), then to erythromycin D by the 
megCIIJ gene product (a homolog of the eryCJII gene product, then to 
erythromycin C by the megK gene product (a homolog of the eryK gene product). 

1 0 In addition to these modification enzymes, such megalomicin-producing 

organisms also contain the modification enzymes necessary for the biosynthesis of 
the desosamine and mycarose moieties that are similarly utilized in erythromycin 
biosynthesis, as shown in Figure 5. Megalomicin A contains the complete 
erythromycin C structure, and its biosynthesis additionally involves the formation 

15 of L-megosamine (L-rhodosamine) and its attachment to the C-6 hydroxyl 
(Figures 3 and 5, inset), followed by acylation of the C-3' 5 ' and(or) C-4" 5 
hydroxy Is as the terminal steps. L-megosamine is the same as Af-dimethyl-L- 
daunosamine; the daunosamine genes have been characterized from Streptomyces 
peucetius (see Colombo and Hutchinson,./ Indus t. Microbiol BiotechnoL, in 

20 press; Otten et al., 1 996, J Bacteriol 178:73 1 6-7321 , and references cited therein). 
Some of the rhodosamine genes also have been cloned and partially characterized 
from another anthracycline producing Streptomyces sp. (see Torkkell et aL, 1997, 
Mol Gen. Genet. 25<5(2):203-209). Because the timing of the glycosylation with 
TDP-megosamine in relation to the addition of mycarose and desosamine to 

25 erythronolide B, plus the C-12 hydroxylation, is unknown, the pathway could 
involve a different order of glycosylation and C-12 hydroxylation steps than the 
one shown in Figure 5. Regardless, the megalomicin biosynthetic gene cluster 
contains the genes to make L-rhodosamine and attach it to the correct macrolide 
substrate. 

30 The biosynthetic pathways to make the glycosides desosamine, mycarose, 

and megosamine are shown in Figure 6. The present invention provides the genes 
for each biosynthetic pathway shown in this Figure, and these recombinant genetic 
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pathways can be used alone or in any combination to confer the pathway to a 
heterologous host. 

The megalomicin PKS locus is similar to the eryA locus in size and 
organization. Most of the deoxysugar biosynthesis genes are homologs of the eryB 

5 mycarose and eryC desosamine biosynthesis and glycosyl attachment genes from 
Saccharopolyspora erythraea (see Summers et al., 1997, Microbiol. 143:3251- 
3262; Haydock et al., 1991, Mol. Gen. Genet. 230:120-128; Gaisser ei al, 1997, 
Mo! Gen Genet, 256:239-251 ; Gaisser et al., 1998, Mol Gen Genet. 257:78-88, 
incorporated herein by reference) or the picC homologs from the picromycin and 

1 0 narbomycin producer (see PCT patent publication No. 99/61 599 and Xue et al., 
1998, Proc. Nat. Acad. ScL USA 95, 1211 1-121 16, incorporated herein by 
reference). The TDP-megosamine biosynthesis genes are homologs of the dnm 
genes (see Figure 5) and the pikromycin N-dimethyltransferase gene or its 
homologs reported in a cluster of L-rhodosamine biosynthesis genes. The putative 

1 5 TDP-megosamine glycosyltransferase gene product (geneX in Figure 5) closely 
resembles the deduced products of the eryBV , eryCUI, dnmS, and pikromycin 
desVIl genes, even though it recognizes different substrates than the products of 
each of these genes. 

The following Table 1 shows the location of the genes in the 

20 Micromonospora megalomicea megalomicin biosynthetic pathway in the DNA 
sequence set forth in SEQ ID NO: 1 (see also Figure 7; note some gene 
designations maybe different in Figure 7). 



Table 1 . Megalomicin Biosynthetic Gene Cluster 
25 Micromonospora megalomicea subsp. nigra (ATCC27598) 

Location Description 

1 ..2451 sequence from cosmid pKOS079-138B 

complement 1.. 144) megBVI (or megT), TDP-4-keto-6-deoxyglucose- 

30 2,3-dehydratase 

928..206 1 megDVI, TDP-4-keto-6-deoxygl ucose 3,4-isomerase 

2072 ..3382 megDI, TDP-megosaminyl transferase (eiyCIII 

homolog) 

2452..40397 sequence of cosmid pKOS079-93D 

35 3462..4634 megG(or megY), mycarosyl acyltransferase 

465 1 ..5 775 megDII, deoxysugar transaminase {eryCI, DnrJ 

homolog) 
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5822.. 6595 
dimethyl transferase 

6592..7197 

5 

7220..8206 
dnmV 

complement(8228..9220) 
1 0 hexose 2,3-reductase 

complement(9226.. 10479) 

complement^ 1 0483.. 1 1424) 

12181..22821 

12181 ..13791 
15 12505.. 13470 

13576..13791 

13849..18207 

13849..15126 

15427.. 16476 
20 17155..17694 

17947.. 18207 

18268..22575 

18268.. 19548 

19876..20910 
25 21517..22053 

223 18..22575 

22S67..33555 

22957-27258 

22957..24237 
30 24544..2558 1 

26230..26733 

26998..2725S 

27313..33312 

27393..28590 
35 28S97..2993 1 

29953. .30477 

3 1396..32244 

32257.-32799 

33052..33312 
40 33666.43271 

-33780.38120 

33780-35027 

35385-36419 

37068..37604 
45 37860-38120 

38187..42425 

38187-39470 

39795..40811 

40398-46641 



megDIII, TDP-daunosaminyl-N,N- 
{eryCVI homolog) 

megDIV, TDP-4-keto-6-deoxyglucose 3,5-epimerase 

(eryBVIJ, dnmU homolog) 

'megDV, TDP-hexose 4-ketoreductase (eryBIV, 

homolog) 

megBUA or megDVII, TDP-4-keto-L-6-deoxy- 

megBV, TDP-mycarosyl transferase 
megBIV, TDP-hexose 4-ketoreductase 
megAI 

Loading Module (L) 

AT-L 

ACP-L 

Extender Module 1 (1 ) 

KS1 

ATI 

KR1 

ACPI 

Extender Module 2 (2) 

KS2 

AT2 

KR2 

ACP2 

megAII 

Extender Module 3 (3) 

KS3 

AT3 

KR3 (inactive) 
ACP3 

Extender Module 4 (4) 

KS4 

AT4 

DH4 

ER4 

KR4 

ACP4 

megAUI 

Extender Module 5 (5) 

KS5 
AT5 
KR5 
ACP5 

Extender Module 6 (6) 

KS6 

AT6 

sequences from cosmid pKOS079-93A 
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41406..41936 KR6 
42168..42425 ACP6 
42585..43271 TE 

43268..44344 megCJL TDP-4-keto-6-deoxyglucose 3,4-isomerase 

5 44355. .45623 megCIII, TDP-desosaminyl transferase 

45620..4659 1 megBII, TDP-4-keto-6-deoxy-L-glucose 2,3 

dehydratase 

complement(46660..47403) megH, TEI1 
complement(474 1 1 ..47980) megF, C-6 hydroxylase 

10 

In a specific embodiment, the invention provides an isolated nucleic acid 
fragment comprising a nucleotide sequence encoding a domain of the 
niegalomicin polyketide synthase or a megalomicin modification enzyme. The 
isolated nucleic acid fragment can be a DNA or a RNA. Preferably, the isolated 
15 nucleic acid fragment is a recombinant DNA compound. A nucleotide sequence 
that is complementary to the nucleotide sequence encoding a domain of 
megalomicin PKS or a megalomicin modification enzyme is also provided. 

The isolated nucleic acid fragment can comprise a single, multiple or all 
the open reading frame(s) (ORF) of the megalomicin PKS or the megalomicin 
20 modification enzyme. Exemplary ORFs of megalomicin PKS include the ORFs of 
the megAI, megAII and megAIll genes. The isolated nucleic acids of the invention 
also include nucleic acids that encode one or more domains and one or more 
modules of the megalomicin PKS. Exemplary domains of the megalomicin PKS 
include a TE domain, a KS domain, an AT domain, an ACP domain, a KR 
25 domain, a DH domain and an ER domain. In a preferred embodiment, the nucleic 
acid comprises the coding sequence for a loading module, a thioesterase domain, 
and all six extender modules of the megalomicin PKS. 

K4egalomicin modification enzymes include those enzymes involved in the 
conversion of 6-DEB into a megalomicin such as the enzymes encoded by megF, 
30 meg BV, megCHI, megK, megDI and megG (or megY). Megalomicin modification 
enzymes also include those enzymes involved in the biosynthesis of mycarose, 
megosamine or desosamine, which are used as biosynthetic intermediates in the 
biosynthesis of various megalomicin species and other related polyketides. The 
enzymes that are involved in biosynthesis of mycarose, megosamine or 
35 desosamine are described in Figures 5 and 10. The megalomicin PKS and 

megalomicin modification enzymes are collectively referred to as megalomicin 
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biosynthetic enzymes; the genes encoding such enzymes are collectively referred 
to as megalomicin biosynthetic genes; and nucleic acids that comprise a portion of 
or entire megalomicin biosynthetic genes are collectively referred to as 
megalomicin biosynthetic nucleic acid(s). 
5 In specific embodiments, the megalomicin biosynthetic nucleic acids 

comprise the sequence of SEQ ID NO: 1 , or the coding regions thereof, or 
nucleotide sequences encoding, in whole or in part, a megalomicin biosynthetic 
enzyme protein. The isolated nucleic acids typically consists of at least 25 
(continuous) nucleotides, 50 nucleotides, 100 nucleotides, 150 nucleotides, or 200 

1 0 nucleotides of megalomicin biosynthetic nucleic acid sequence, or a full-length 
megalomicin biosynthetic coding sequence. In another embodiment, the nucleic 
acids are smaller than 35, 200, or 500 nucleotides in length. Nucleic acids can be 
single or double. stranded. Nucleic acids that hybridize to or are complementary to 
the foregoing sequences, in particular the inverse complement to nucleic acids that 

1 5 hybridize to the foregoing sequences (i.e., the inverse complement of a nucleic 
acid strand has the complementary sequence running in reverse orientation to the 
strand so that the inverse complement would hybridize without mismatches to the 
nucleic acid strand) are also provided. In specific aspects, nucleic acids are 
provided which comprise a sequence complementary to (specifically are the 

20 inverse complement of) at least 10, 25, 50, 100, or 200 nucleotides or the entire 
coding region of a megalomicin biosynthetic gene. 

The megalomicin biosynthetic nucleic acids provided herein include those 
with nucleotide sequences encoding substantially the same amino acid sequences 
as found in native megalomicin biosynthetic enzyme proteins, and those encoding 

25 amino acid sequences with functionally equivalent amino acids, as well as 

megalomicin biosynthetic enzyme derivatives or analogs as described in Section 
IV. 

Some regions within the megalomicin PKS genes are highly homologous 
or identical to one another, as can be readily identified by an analysis of the 
30 sequence. The coding sequence for the KS and AT domains of module 2 shares 
significant identity with the coding sequence for the KS and AT domains of 
module .6. This sequence homology or identity at the nucleic acid, e.g., DNA, level 
can render the nucleic acid unstable in certain host cells. To improve the stability 
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of the nucleic acids comprising a portion or the entire megalomicin PKS genes and 
megalomicin modification enzyme genes, the nucleic acid or DNA sequences can 
be changed to reduce or abolish the sequence homology or identity. Preferably, 
the DNA codons of homologous regions within the PKS or the megalomicin 
5 modification enzyme coding sequence are changed to reduce or abolish the 
sequence homology or identity without changing the amino acid sequences 
encoded by said changed DNA codons (see the examples below). The stability of 
the nucleic acid or DNA can also be improved by codon changes that reduce or 
abolish the sequence homology or identity while also changing the amino acid 
10 sequence, provided that the amino acid sequence change(s) does not substantially 
change the desired activity of the encoded megalomicin PKS. Thus, for example, 
one can simply substitute for the megAIll ORF an ORF from eryAlU, oleAIU, 
picAIII, or picAlV genes. 

The recombinant DNA compounds of the invention that encode the 
15 megalomicin PKS and modification proteins or portions thereof are useful in a 

variety of applications. While many of these applications relate to the heterologous 
expression of the megalomicin biosynthetic genes or the construction of hybrid 
PKS enzymes, many useful applications involve the natural megalomicin producer 
Micromonospora megalomicea. For example, one can use the recombinant DNA 
20 compounds of the invention to disrupt the megalomicin biosynthetic genes by 

homologous recombination in Micromonospora megalomicea. The resulting host 
cell is a preferred host cell for making polyketides modified by oxidation, 
hydroxylation, glycosylation, and acylation in a manner similar to megalomicin, 
because the genes that encode the proteins that perform these reactions are of 
25 course present in the host cell, and because the host cell does not produce 

megalomicin that could interfere with production or purification of the polyketide 
of interest. 

One illustrative recombinant host cell provided by the present invention 
expresses a recombinant megalomicin PKS in which the module 1 KS domain is 
30 inactivated by deletion or other mutation. In a preferred embodiment, the 

inactivation is mediated by a change in the KS domain that renders it incapable of 
binding substrate (called a KS1° mutation). In a particularly preferred 
embodiment, this inactivation is rendered by a mutation in the codon for the active 
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site cysteine that changes the codon to another codon, such as an alanine codon. 
Such constructs are especially useful when placed in translational reading frame 
with extender modules 1 and 2 of a megalomicin or the corresponding modules of 
another PKS. The utility of these constructs is that host cells expressing, or cell 
5 free extracts containing, a PKS comprising the protein encoded thereby can be fed 
or supplied with N-acylcysteamine thioesters of precursor molecules to prepare a 
polyketide of interest. See U.S. patent application Serial No. 09/492,773, filed 27 
Jan. 2000, and PCT patent publication No. 00/44717, both of which are 
incorporated herein by reference. Such KS1° constructs of the invention are useful 

10 in the production of 1 3-substituted-megalomicin compounds in Micromonospora 
megalomicea host cells. Preferred compounds of the invention include those 
compounds in which the substituent at the 13-position is propyl, vinyl, propargyl, 
other lower alkyl, and substituted alkyl. 

In a variant of this embodiment, one can employ a megalomicin PKS in 

1 5 which the ACP domain of module 1 has been rendered inactive. In another 

embodiment, one can delete the loading domain of the megalomicin PKS and 
provide monoketide substrates for processing by the remainder of the PKS. 

The compounds of the invention can also be used to construct recombinant 
host cells of the invention in which coding sequences for one or more domains or 

20 modules of the megalomicin PKS or for another megalomicin biosynthetic gene 
have been deleted by homologous recombination with the Micromonospora 
megalomicea chromosomal DNA. Those of skill in the art will appreciate that the 
compounds used in the recombination process are characterized by their homology 
with the chromosomal DNA and not by encoding a functional protein due to their 

25 intended function of deleting or otherwise altering portions of chromosomal DNA. 
For this and a variety of other applications, the compounds of the present 
invention include not only those DNA compounds that encode functional proteins 
but also those DNA compounds that are complementary or identical to any portion 
of the megalomicin biosynthetic genes. 

30 Thus, the invention provides a variety of modified Micromonospora 

megalomicea host cells in which one or more of the megalomicin biosynthetic 
genes have been mutated or disrupted. Transformation systems for M 
megalomicea have been described by Hasegawa et aL, 1991, J. BacterioL 
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775:7004-1 1; and Takada et aL, 1994, J. Antibiot. 47:\ 167-1 170, both of which 
are incorporated herein by reference. These cells are especially useful when it is 
desired to replace the disrupted function with a gene product expressed by a 
recombinant DNA expression vector. While such expression vectors of the 

5 invention are described in more detail in the following Section, those of skill in 
the art will appreciate that the vectors have application to M. megalomicea as well. 
Such M megalomicea host cells can be preferred host cells for expressing 
megalomicin derivatives of the invention. Particularly preferred host cells of this 
type include those in which the coding sequence for the loading module has been 

10 mutated or disrupted, those in which one or more of any of the PKS gene ORFs 
has been mutated or disrupted, and/or those in which the genes for one or more 
modification (glycosylation, acylation, hydroxylation) have been mutated or 
disrupted. 

While the present invention provides many useful compounds having 
15 application to, and recombinant host cells derived from, Micromonospora 

megalomicea , many important applications of the present invention relate to the 
heterologous expression of all or a portion of the megalomicin biosynthetic genes 
in cells other than M. megalomicea, as described in Section V. 

20 Section IV: The Meealomicin Biosynthetic Enzymes and Antibodies Recognizing 
such Enzymes 

In another specific embodiment, the invention provides a substantially 
purified polypeptide, which is encoded by a nucleic acid fragment comprising a 
nucleotide sequence encoding a domain of megalomicin polyketide synthase 

25 (PKS) or a megalomicin modification enzyme. The polypeptide can comprise a 
single domain, multiple domains or a full-length megalomicin PKS or 
megalomicin modification enzyme. Functional fragments, analogs or derivatives 
of the megalomicin PKS or megalomicin modification enzyme polypeptides are 
also provided. Preferably, such fragments, analogs or derivatives can be 

30 recognized an antibody raised against a megalomicin PKS or megalomicin 

modification enzyme. Also preferably, such fragments, analogs or derivatives 
comprise an amino acid sequence that has at least 60% identity, more preferably at 
least 90% identity to their wild type counterparts. 
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An exemplary nucleotide sequence encoding, and the corresponding amino 
acid sequence of, a megalomicin biosynthetic enzyme is disclosed in SEQ ID 
NO: 1 . Homologs (e.g. , nucleic acids of the above-listed genes of species other 
than Micromonospora megalomicea) or other related sequences {e.g., paralogs) 
5 can be obtained by low, moderate or high stringency hybridization with all or a 
portion of the particular sequence provided as a probe using methods well known 
in the art for nucleic acid hybridization and cloning (e.g., as described in Section 
III) in accordance with the methods of the present invention. 

The megalomicin biosynthetic enzyme proteins, or domains thereof, of the 

10 present invention can be obtained by methods well known in the art for protein 
purification and recombinant protein expression in accordance with the methods 
of the present invention. For recombinant expression of one or more of the 
proteins, the nucleic acid containing all or a portion of the nucleotide sequence 
encoding the protein can be inserted into an appropriate expression vector, i.e., a 

1 5 vector that contains the necessary elements for the transcription and translation of 
the inserted protein coding sequence. Transcriptional and translational signals can 
be supplied by the native promoter for a megalomicin biosynthetic gene and/or 
flanking regions. 

A variety of host- vector systems may be utilized to express the protein 
20 coding sequence. These include but are not limited to mammalian cell systems 
infected with virus (e.g. vaccinia virus, adenovirus, and the like); insect cell 
systems infected with virus (e.g. baculovirus); microorganisms such as yeast 
containing yeast vectors; or bacteria transformed with bacteriophage, DNA, 
plasmid DNA, or cosmid DNA. The expression elements of vectors vary in their 
25 properties. Depending on the host-vector system utilized, any one of a number of 
suitable transcription and translation elements may be used. 

In a specific embodiment, a vector is used that comprises a promoter 
operably linked to nucleic acid sequences encoding a megalomicin biosynthetic' 
enzyme, or a domain, fragment, derivative or homolog, thereof, one or more 
30 origins of replication, and optionally, one or more selectable markers (e.g., an 
antibiotic resistance gene). 

Expression vectors containing the sequences of interest can be identified 
by three general approaches: (a) nucleic acid hybridization, (b) presence or 
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absence of "marker" gene function, and (c) expression of the inserted sequences. 
In the first approach, megalomicin biosynthetic nucleic acid sequences can be 
detected by nucleic acid hybridization to probes comprising sequences 
homologous and complementary to the inserted sequences. In the second 
5 approach, the recombinant vector/host system can be identified and selected based 
upon the presence or absence of certain "marker" functions (e.g., binding to an 
anti-megalomicin biosynthetic enzyme antibody, resistance to antibiotics, 
occlusion body formation in baculovirus, and the like) caused by insertion of the 
sequences of interest in the vector. For example, if a megalomicin biosynthetic 
10 gene, or portion thereof, is inserted within the marker gene sequence of the vector, 
recombinants containing the megalomicin biosynthetic gene fragment will be 
identified by the absence of the marker gene function. In the third approach, 
recombinant expression vectors can be identified by assaying for the megalomicin 
biosynthetic gene products expressed by the recombinant vector. Such assays can 
15 be based, for example, on the physical or functional properties of the interacting 
species in in vitro assay systems, e.g., megalomicin synthesis activity, 
immunoreactivity to antibodies specific for the protein. 

Once recombinant megalomicin biosynthetic genes or nucleic acids are 
identified, several methods known in the art can be used to propagate them in 
20 accordance with the methods of the present invention. Once a suitable host 
system and growth conditions have been established, recombinant expression 
vectors can be propagated and amplified in quantity. As previously described, the 
expression vectors or derivatives which can be used include, but are not limited to: 
human or animal viruses such as vaccinia virus or adenovirus; insect viruses such 
25 as baculovirus, yeast vectors; bacteriophage vectors such as lambda phage; and 
plasmid and cosmid vectors. 

In addition, a host cell strain may be chosen that modulates the expression 
of the inserted sequences, or modifies or processes the expressed proteins in the 
specific fashion desired. Expression from certain promoters can be elevated in the 
30 presence of certain inducers; thus expression of the genetically-engineered 

megalomicin biosynthetic enzymes may be controlled. Furthermore, different host 
cells have characteristic and specific mechanisms for the translational and post- 
translational processing and modification (e.g. glycosylation, phosphorylation, and 
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the like) of proteins. Appropriate cell lines or host systems can be chosen to 
ensure the desired modification and processing of the foreign protein is achieved. 
For example, expression in a bacterial system can be used to produce an 
unglycosylated core protein, while expression in mammalian cells ensures 
5 "native" glycosylation of a heterologous protein. Furthermore, different 

vector/host expression systems may effect processing reactions to different extent. 

In particular, megalomicin biosynthetic enzyme derivatives can be made by 
altering their sequences by substitutions, additions or deletions that provide for 
functionally equivalent molecules. Due to the degeneracy of nucleotide coding 

10 sequences, other DNA sequences which encode substantially the same amino acid 

* > 

sequence as an megalomicin biosynthetic gene can be used in the practice of the 
present invention. These include but are not limited to nucleotide sequences 
comprising all or portions of megalomicin biosynthetic genes that are altered by 
the substitution of different codons that encode the amino acid residue within the 

1 5 sequence, thus producing a silent change. Likewise, the megalomicin biosynthetic 
enzyme derivatives of the invention include, but are not limited to, those 
containing, as a primary amino acid sequence, all or part of the amino acid 
sequence of megalomicin biosynthetic enzymes, including altered sequences in 
which functionally equivalent amino acid residues are substituted for residues 

20 within the sequence resulting in a silent change. For example, one or more amino 
acid residues within the sequence can be substituted by another amino acid of a 
similar polarity which acts as a functional equivalent, resulting in a silent 
alteration. Substitutes for an amino acid within the sequence may be selected 
from other members of the class to which the amino acid belongs. For example, 

25 the nonpolar (hydrophobic) amino acids include alanine, leucine, isoleucine, 
valine, proline, phenylalanine, tryptophan and methionine. The polar neutral 
amino acids include glycine, serine, threonine, cysteine, tyrosine, asparagine, and 
glutamine. The positively charged (basic) amino acids include arginine, lysine and 
histidine. The negatively charged (acidic) amino acids include aspartic acid and 

30 glutamic acid. 

In a specific embodiment of the invention, the nucleic acids encoding 
proteins and proteins consisting of or comprising a domain or a fragment of 
megalomicin biosynthetic enzyme consisting of at least 6 (continuous) amino 
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acids are provided. In other embodiments, the domain or fragment consists of at 
least 10, 20, 30, 40, or 50 amino acids of a megalomicin biosynthetic enzyme. In 
specific embodiments, such domains or fragments are not larger than 35, 100 or 
200 amino acids. Derivatives or analogs of megalomicin biosynthetic enzyme 
5 include but are not limited to molecules comprising regions that are substantially 
homologous to megalomicin biosynthetic enzyme in various embodiments, at least 
30%, 40%, 50%, 60%, 70%, 80%, 90% or 95% identity over an amino acid 
sequence of identical size or when compared to an aligned sequence in which the 
alignment is done by a computer homology program known in the art in 
1 0 accordance with the methods of the present invention or whose encoding nucleic 
acid is capable of hybridizing to a sequence encoding a megalomicin biosynthetic 
enzyme under stringent, moderately stringent, or nonstringent conditions. 

The megalomicin biosynthetic enzyme domains, derivatives and analogs of 
the invention can be produced by various methods known in the art in accordance 
1 5 with the methods of the present invention. The manipulations which result in their 
production can occur at the gene or protein level. For example, the cloned 
megalomicin biosynthetic gene sequence can be modified by any of numerous 
strategies known in the art (Sambrook et al. 5 1990, Molecular Cloning, A 
Laboratory Manual, 2d ed., Cold Spring Harbor Laboratory, Cold Spring Harbor, 
20 New York) in accordance with the methods of the present invention. The 

sequences can be cleaved at appropriate sites with restriction endonuclease(s), 
followed by further enzymatic modification if desired, isolated, and ligated in 
vitro. 

Additionally, the megalomicin biosynthetic enzyme-encoding nucleotide 
25 sequence can be mutated in vitro or in vivo 9 to create and/or destroy translation, 
initiation, and/or termination sequences, or to create variations in coding regions 
and/or form new restriction endonuclease sites or destroy pre-existing ones, to 
facilitate further in vitro modification; Any technique for mutagenesis known in 
the art can be used in accordance with the methods of the present invention, 
30 including but not limited to, chemical mutagenesis and in vitro site-directed 
mutagenesis (Hutchinson et al., J. Biol Chem. 253:6551-6558 (1978)), use of 
TAB® linkers (Pharmacia), and the like. 
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Once a recombinant cell expressing a megalomicin biosynthetic enzyme 
protein, or a domain, fragment or derivative thereof, is identified, the individual 
gene product can be isolated and analyzed. This is achieved by assays based on 
the physical and/or functional properties of the protein, including, but not limited 
5 to, radioactive labeling of the product followed by analysis by gel electrophoresis, 
immunoassay, cross-linking to marker-labeled product, and the like. 

The megalomicin biosynthetic enzyme proteins may be isolated and 
purified by standard methods known in the art or recombinant host cells 
expressing the complexes or proteins in accordance with the methods of the 

10 invention, including but not restricted to column chromatography {e.g., ion 

exchange, affinity, gel exclusion, reversed-phase high pressure, fast protein liquid, 
and the like), differential centrifugation, differential solubility, or by any other 
standard technique used for the purification of proteins. Functional properties 
may be evaluated using any suitable assay known in the art in accordance with the 

15 methods of the present invention. 

Alternatively, once a megalomicin biosynthetic enzyme or its domain or 
derivative is identified, the amino acid sequence of the protein can be deduced 
from the nucleotide sequence of the gene which encodes it As a result, the 
protein or its domain or derivative can be synthesized by standard chemical 

20 methods known in the art in accordance with the methods of the present invention 
(see Hunkapiller et al, Nature 310: 1 05- 1 1 1 (1 984)). 

Manipulations of megalomicin biosynthetic enzymes may be made at the 
protein level. Included within the scope of the invention are megalomicin 
biosynthetic enzyme domains, derivatives or analogs or fragments, which are 

25 differentially modified during or after translation, e.g., by glycosylation, 
acetylation, phosphorylation, amidation, derivatization by known 
protecting/blocking groups, proteolytic cleavage, linkage to an antibody molecule 
or other cellular ligand, and the like. Any of numerous chemical modifications 
may be carried out by known techniques, including but not limited to specific 

30 chemical cleavage by cyanogen bromide, trypsin, chymotrypsin, papain, V8 
protease, NaBRj, acetylation, formylation, oxidation, reduction, metabolic 
synthesis in the presence of tunicamycin, and the like. 
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In specific embodiments, the megalomicin biosynthetic enzymes are 
modified to include a fluorescent label. In other specific embodiments, the 
megalomicin biosynthetic enzyme is modified to have a heterofunctional reagent, 
such heterofunctional reagents can be used to crosslink the members of the 
5 complex. 

In addition, domains, analogs and derivatives of a megalomicin 
biosynthetic enzyme can be chemically synthesized. For example, a peptide 
corresponding to a portion of a megalomicin biosynthetic enzyme, which 
comprises the desired domain or which mediates the desired activity in vitro can 

10 be synthesized by use of a peptide synthesizer. Furthermore, if desired, 

nonclassical amino acids or chemical amino acid analogs can be introduced as a 
substitution or addition into the megalomicin biosynthetic enzyme sequence. 
Non-classical amino acids include but are not limited to the D-isomers of the 
common amino acids, alpha-amino isobutyric acid, 4-aminobutyric acid, 

1 5 2-aminobutyric acid, 6-amino hexanoic acid, Aib, 2-amino isobutyric acid, 
3-amino propionoic acid, ornithine, norleucine, norvaline, hydroxyproline, 
sarcosine, citrulline, cysteic acid, t-butylglycine, t-butylalanine, phenylglycine, 
cyclohexylalanine, B-alanine, fluoro-amino acids, designer amino acids such as B- 
methyl amino acids, Ca-methyl amino acids, Na-methyl amino acids, and amino 

20 acid analogs in general. Furthermore, the amino acid can be D (dextrorotary) or L 
(levorotary). 

In cases where natural products are suspected of being mutant or are 
isolated from new species, the amino acid sequence of the megalomicin 
biosynthetic enzyme isolated from the natural source, as well as those expressed in 

25 vitro, or from synthesized expression vectors in vivo or in vitro > can be determined 
from analysis of the DNA sequence, or alternatively, by direct sequencing of the 
isolated protein. Such analysis may be performed by manual sequencing or 
through use of an automated amino acid sequenator. 

The megalomicin biosynthetic enzyme proteins may also be analyzed by 

30 hydrophilicity analysis (Hopp and Woods, Proc. Natl. Acad. ScL USA 78:3824- 
3828 (1981)). A hydrophilicity profile can be used to identify the hydrophobic 
and hydrophilic regions of the proteins, and help predict their orientation in 
designing substrates for experimental manipulation, such as in binding 
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experiments, antibody synthesis, and the like. Secondary structural analysis can 
also be done to identify regions of the megalomicin biosynthetic enzyme that 
assume specific structures (Chou and Fasman, Biochemistry J_3:222-23 (1974)). 
Manipulation, translation, secondary structure prediction, hydrophilicity and 
5 hydrophobicity profiles, open reading frame prediction and plotting, and 

determination of sequence homologies, can be accomplished using computer 
software programs available in the art. 

Other methods of structural analysis including but not limited to X-ray 
crystallography (Engstrom, Biochem. Exp. Biol lVJAj (1974)), mass 
10 spectroscopy and gas chromatography (Methods in Protein Science, J. Wiley and 
Sons, New York, 1 997), and computer modeling (Fletterick and Zoller, eds., 1986, 
Computer Graphics and Molecular Modeling, In; Current Communications in 
Molecular Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor Press, 
New York) can also be employed. 
1 5 The invention also provides an antibody, or a fragment or derivative 

thereof, which immuno-specifically binds to a domain of megalomicin polyketide 
synthase (PKS) or a megalomicin modification enzyme. In a specific 
embodiment, an antibody which immuno-specifically binds to a domain of the 
megalomicin biosynthetic enzyme encoded by a nucleic acid that hybridizes to a 
20 nucleic acid having the nucleotide sequence set forth in the SEQ. ID NO; 1 5 or a 
fragment or derivative of said antibody containing the binding domain thereof is 
provided. Preferably, the antibody is a monoclonal antibody. 

The megalomicin biosynthetic enzyme protein and domains, fragments, 
homologs and derivatives thereof may be used as immunogens to generate 
25 antibodies which immunospecifically bind such immunogens. Such antibodies 
include but are not limited to polyclonal, monoclonal, chimeric, single chain, Fab 
fragments, and an Fab expression library. 

Various procedures known in the art may be used for the production of 
polyclonal antibodies to a megalomicin biosynthetic enzyme protein of the 
30 invention, its domains, derivatives, fragments or analogs in accordance with the 
methods of the present invention. 

For production of the antibody, various host animals can be immunized by 
injection with the native megalomicin biosynthetic enzyme protein or a synthetic 
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version, or a derivative of the foregoing, such as a cross-linked megalomicin 
biosynthetic enzyme. Such host animals include but are not limited to rabbits, 
mice, rats, and the like. Various adjuvants can be used to increase the 
immunological response, depending on the host species, and include but are not 
5 limited to Freund's (complete and incomplete), mineral gels such as aluminum 
hydroxide, surface active substances such as lysolecithin, pluronic polyols, 
polyanions, peptides, oil emulsions, dinitrophenol, and potentially useful human 
adjuvants such as bacille Calmette-Guerin (BCG) and corynebacterium parvuni. 

For preparation of monoclonal antibodies directed towards a megalomicin 
10 biosynthetic enzyme or domains, derivatives, fragments or analogs thereof, any 
technique that provides for the production of antibody molecules by continuous 
cell lines in culture may be used. Such techniques include but are not restricted to 
the hybridoma technique originally developed by Kohler and Milstein {Nature 
256:495-497 (1975)), the trioma technique, the human B-cell hybridoma technique 
1 5 (Kozbor et al., Immunology Today 4:72 (1983)), and the EBV hybridoma 

technique to produce human monoclonal antibodies (Cole et al., in Monoclonal 
Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96 (1985)). In an 
additional embodiment, monoclonal antibodies can be produced in germ-free 
animals (WO89/12690). Human antibodies may be used and can be obtained by 
20 using human hybridomas (Cote et al., Proc. Natl Acad. Sci USA 80:2026-2030 
(1983)) or by transforming human B cells with EBV virus in vitro (Cole et al., in 
Monoclonal Antibodies and Cancer Therapy, Alan R. Liss,.Inc, pp. 77-96 
(1985)). Techniques developed for the production of "chimeric antibodies' 5 
(Morrison et al., Proc. Natl. Acad Sci. USA 81:6851-6855 (1984); Neuberger et 
25 - al., Nature 312:604-608 (1984); Takeda et al., Nature 314:452-454 (1985)) by 
splicing the genes from a mouse antibody molecule specific for the megalomicin 
biosynthetic enzyme protein together with genes from a human antibody molecule 
of appropriate biological activity can be used; such antibodies are within the scope 
of this invention. 

30 Techniques described for the production of single chain antibodies (U.S. 

patent 4,946,778) can be adapted to produce megalomicin biosynthetic enzyme- 
specific single chain antibodies. An additional embodiment utilizes the techniques 
described for the construction of Fab expression libraries (Huse et al., Science 
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246 :1275-1281 (1989)) to allow rapid and easy identification of monoclonal Fab 
fragments with the desired specificity for megalomicin biosynthetic enzyme, or 
domains, derivatives, or analogs thereof. Non-human antibodies can be 
"humanized" by known methods (see, e.g., U.S. Patent No. 5,225,539). 
5 Antibody fragments that contain the idiotypes of a megalomicin 

biosynthetic enzyme can be generated by techniques known in the art in 
accordance with the methods of the present invention. For example, such 
fragments include but are not limited to: the F(ab')2 fragment which can be 
produced by pepsin digestion of the antibody molecule; the Fab' fragments that 

1 0 can be generated by reducing the disulfide bridges of the F(ab')2 fragment, the Fab 
fragments that can be generated by treating the antibody molecular with papain 
and a reducing agent, and Fv fragments. 

In the production of antibodies, screening for the desired antibody can be 
accomplished by techniques known in the art in accordance with the methods of 

1 5 the present invention, e.g. , ELISA (enzyme-linked immunosorbent assay). To 
select antibodies specific to a particular domain of the megalomicin biosynthetic 
enzyme, one may assay generated hybridomas for a product that binds to the 
fragment of a megalomicin biosynthetic enzyme that contains such a domain. 

The foregoing antibodies can be used in methods known in the art relating 

20 to the localization and/or quantitation of megalomicin biosynthetic enzyme 

proteins, e.g., for imaging these proteins or measuring levels thereof in samples, in 
accordance with the methods of the present invention. 

Section V: Heterologous Expression of the Megalomicin Biosynthetic Genes 
25 In one important embodiment, the invention provides methods for the 

heterologous expression of one or more of the megalomicin biosynthetic genes 
and recombinant DNA expression vectors useful in the method. For purposes of 
the invention, any host cell other than Micromonospord megalomiced is a 
heterologous host cell. Thus, included within the scope of the invention in 
30 addition to isolated nucleic acids encoding domains, modules, or proteins of the 
megalomicin PKS and modification enzymes, are recombinant expression vectors 
that include such nucleic acids. The term expression vector refers to a nucleic acid 
that can be introduced into a host cell or cell -free transcription and translation 
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system. An expression vector can be maintained permanently or transiently in a 
cell, whether as part of the chromosomal or other DMA in the cell or in any 
cellular compartment, such as a replicating vector in the cytoplasm. An expression 
vector also comprises a promoter that drives expression of an RNA, which 
5 typically is translated into a polypeptide in the cell or cell extract. For efficient 
translation of RNA into protein, the expression vector also typically contains a 
ribosome-binding site sequence positioned upstream of the start codon of the 
coding sequence of the gene to be expressed. Other elements, such as enhancers, 
secretion signal sequences, transcription termination sequences, and one or more 
1 0 marker genes by which host cells containing the vector can be identified and/or 
selected, may also be present in an expression vector. Selectable markers, i.e., 
genes that confer antibiotic resistance or sensitivity, are preferred and confer a 
selectable phenotype on transformed cells when the cells are grown in an 
appropriate selective medium. 
1 5 The various components of an expression vector can vary widely, 

depending on the intended use of the vector and the host cell(s) in which the 
vector is intended to replicate or drive expression. Expression vector components 
suitable for the expression of genes and maintenance of vectors in E. coli, yeast, 
Streptomyces, and other commonly used cells are widely known and commercially 
20 available. For example, suitable promoters for inclusion in the expression vectors 
of the invention include those that function in eucaryotic or procaryotic host cells. 
Promoters can comprise regulatory sequences that allow for regulation of 
expression relative to the growth of the host cell or that cause the expression of a 
gene to be turned on or off in response to a chemical or physical stimulus. For £. 
25 coli and certain other bacterial host cells, promoters derived from genes for 
biosynthetic enzymes, antibiotic-resistance conferring enzymes, and phage 
proteins can be used and include, for example, the galactose, lactose (/ac), 
maltose, tryptophan (trp), beta-lactamase (bla), bacteriophage lambda PL, and T5 
promoters. In addition, synthetic promoters, such as the tac promoter (U.S. Patent 
30 No. 4,55 1 ,433), can also be used. 

Thus, recombinant expression vectors contain at least one expression 
system, which, in turn, is composed of at least a portion of the megalomicin PKS 
and/or other megalomicin biosynthetic gene coding sequences operably linked to a 
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promoter and optionally termination sequences that operate to effect expression of 
the coding sequence in compatible host cells. The host cells are modified by 
transformation with the recombinant DNA expression vectors of the invention to 
contain the expression system sequences either as extrachromosomal elements or 
5 integrated into the chromosome. The resulting host cells of the invention are 

useful in methods to produce PKS and post-PKS modification enzymes as well as 
polyketides and antibiotics and other useful compounds derived therefrom. 

Preferred host cells for purposes of selecting vector components for 
expression vectors of the present invention include fungal host cells such as yeast 

10 and procaryotic host cells such as E. coli and Streptomyces, but mammalian host 
cells can also be used. In hosts such as yeasts, plants, or mammalian cells that 
ordinarily do not produce polyketides, it may be necessary to provide, also 
typically by recombinant means, suitable holo-ACP synthases to convert the 
recombinantly produced PKS to functionality. Provision of such enzymes is 

1 5 described, for example, in PCT publication Nos. WO 97/1 3845 and 98/27203, 

each of which is incorporated herein by reference. Particularly preferred host cells 
for purposes of the present invention are Streptomyces and Saccharopolyspora 
host cells, as discussed in greater detail below. 

In a preferred embodiment, the expression vectors of the invention are 

20 used to construct a heterologous recombinant Streptomyces host cell that expresses 
a recombinant PKS of the invention. Streptomyces is a convenient host for 
expressing polyketides, because polyketides are naturally produced in certain 
Streptomyces species, and Streptomyces cells generally produce the precursors 
needed to form the desired polyketide. Those of skill in the art will recognize that, 

25 if a Streptomyces host cell produces any portion of a PKS enzyme or produces a 
polyketide modification enzyme, the recombinant vector need drive expression of 
only those genes constituting the remainder of the desired PKS enzyme or other 
polyketide-modifying enzymes. Thus, such a vector may comprise only a single 
ORF, with the desired remainder of the polypeptides constituting the PKS 

30 provided by the genes on the host cell chromosomal DNA. 

If a Streptomyces or other host cell ordinarily produces polyketides, it may 
be desirable to modify the host so as to prevent the production of endogenous 
polyketides prior to its use to express a recombinant PKS of the invention. Such 
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modified hosts include S. coelicolor CH999 and similarly modified S. lividans 
described in U.S. Patent No. 5,672,491, and PCT publication Nos. WO 95/08548 
and WO 96/40968, incorporated herein by reference. In such hosts, it may not be 
necessary to provide enzymatic activities for all of the desired post-translational 
5 modifications of the enzymes that make up the recombinantly produced PKS, 
because the host naturally expresses such enzymes. In particular, these hosts 
generally contain hoIo-ACP synthases that provide the phosphopantotheinyl 
residue needed for functionality of the PKS. 

The invention provides a wide variety of expression vectors for use in 
10 Streptomyces : The replicating expression vectors of the present invention include, 
for example and without limitation, those that comprise an origin of replication 
from a low copy number vector, such as SCP2* (see Hopwood et Genetic 
Manipulation of Streptomyces: A Laboratory manual (The John Innes Foundation, 
Norwich, U.K., 1985); Lydiate et aL, 1985, Gene 35: 223-235; and Kieser and 
1 5 Melton, 1988, Gene 65: 83-91 , each of which is incorporated herein by reference), 
SLP1.2 (Thompson et uL, 19S2, Gene 20: 51-62, incorporated herein by 
reference), and pSG5(ts) (Muth et aL 9 1989, Mol Gen. Genet 219: 341-348, and 
Bierman et ah, 1992, Gene 116: 43-49, each of which is incorporated herein by 
reference), or a high copy number vector, such as pi J 101 and pJVl (see Katz et 
20 al. 9 1983,./. Gen. Microbiol 129: 2703-2714; Vara et aL y 1989, .7 Bacieriol. 171: 
5782-5781; and Servin-Gonzalez, 1993, Plasmid 30: 131-140, each of which is 
incorporated herein by reference). For non-replicating and integrating vectors and 
generally for any vector, it is useful to include at least an E, coli origin of 
replication, such as from pUC, plP, pi I, and pBR. For phage based vectors, the 
25 phage phiC3 1 and its derivative KC5 1 5 can be employed (see Hopwood et al. 9 
supra). Also, plasmid pSET152, plasmid pSAM, plasmids pSElOl and pSE21 1, 
all of which integrate site-specifically in the chromosomal DNA of S. lividans, can 
be employed for purposes of the present invention. 

The Streptomyces recombinant expression vectors of the invention 
30 typically comprise one or more selectable markers, including antibiotic resistance 
conferring genes selected from the group consisting of the ermE (confers 
resistance to erythromycin and lincomycin), tsr (confers resistance to 
thiostrepton), aadA (confers resistance to spectinomycin and streptomycin), aacC4 
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(confers resistance to apramycin, kanamycin, gentamicin, geneticin (G418), and 
neomycin), hyg (confers resistance to hygromycin), and vph (confers resistance to 
viomycin) resistance conferring genes. Alternatively, several polyketides are 
naturally colored, and this characteristic can provide a built-in marker for 
5 identifying cells. 

Megalomicins are currently produced only by the relatively genetically 
intractable host Micromonospora megalomicinea. This bacteria has not been 
commonly used in the fermentation industry for the large-scale production of 
antibiotics, and methods for high level production of megalomicin and its analogs 

10 are needed. In contrast, the strep tornycete bacteria have been widely used for 
almost 50 years and are excellent hosts for production of megalomicin and its 
analogs. Streptomyces lividans and S. coelicolor have been developed for the 
expression of heterologous PKS systems. These organisms can stably maintain 
cloned heterologous PKS genes, express them at high levels under controlled 

15 conditions, and modify the corresponding PKS proteins (e.g., 

phosphopantotheinylation) so that they are capable of production of the polyketide 
they encode. Furthermore, these hosts contain the necessary pathways to produce 
the substrates required for polyketide synthesis; e.g. propionyl-CoA and 
methylmalonyl-CoA. A wide variety of cloning and expression vectors are 

20 available for these hosts, as are methods for the introduction and stable 

maintenance of large segments of foreign DNA. Relative to Micromonospora spp., 
S. lividans and S. coelicolor grow well on a number of media and have been 
adapted for high level production of polyketides in fermentors. If production levels 
are low, a number of rational approaches are available to improve yield (see 

25 Hosted and Baltz, 1996, Trends BiotechnoL i4(7):245o0, incorporated herein by 
reference). Empirical methods to increase the titers of these macrolides, long since 
proven effective for numerous bacterial polyketides, can also be employed. 

Preferred Streptomyces host cell/vector combinations of the invention 
include S. coelicolor CH999 and S. lividans K4-1 14 host cells, which have been 

30 modified so as not to produce the polyketide actinorhodin, and expression vectors 
derived from the pRMl and pRM5 vectors, as described in U.S. Patent Nos. 
5,830,750 and 6,022,731 and U.S. patent application Serial No. 09/181,833, filed 
28 Oct. 1998, each of which is incorporated herein by reference. These vectors are 
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particularly preferred in that they contain promoters compatible with numerous 
and diverse Streptomyces spp. Particularly useful promoters for Streptomyces host 
cells include those from PKS gene clusters that result in the production of 
polyketides as secondary metabolites, including promoters from aromatic (Type II) 
5 PKS gene clusters. Examples of Type II PKS gene cluster promoters are act gene 
promoters and tern gene promoters; an example of a Type I PKS gene cluster 
promoter are the promoters of the spiramycin PKS genes and DEBS genes. The 
present invention also provides the megalomicin biosynthetic gene promoters in 
recombinant form. These promoters can be used to drive expression of the 
10 megalomicin biosynthetic genes or any other coding sequence of interest in host 
cells in which the promoter functions, particularly Micromonospora megalomicea 
and generally any Streptomyces species. 

As described above, particularly useful control sequences are those that 
alone or together with suitable regulatory systems activate expression during 
15 transition from growth to stationary phase in the vegetative mycelium. The 
promoter contained in the aforementioned plasmid pRM5, i.e., the actl/actlll 
promoter pair and the actII-ORF4 activator gene, is particularly preferred. Other 
useful Streptomyces promoters include without limitation those from the ermE 
gene and the melCl gene, which act constitutively, and the tipA gene and the rnerA 
20 gene, which can be induced at any growth stage. In addition, the T7 RNA 

polymerase system has been transferred to Streptomyces and can be employed in 
the vectors and host cells of the invention. In this system, the coding sequence for 
the T7 RNA polymerase is inserted into a neutral site of the chromosome or in a 
vector under the control of the inducible merA promoter, and the gene of interest is 
25 placed under the control of the T7 promoter. As noted above, one or more 
activator genes can also be employed to enhance the activity of a promoter. 
Activator genes in addition to the actII-ORF4 gene described above include dnrl, 
redD, and ptpA genes (see U.S. patent application Serial No. 09/1 8 1 ,833," supra). 
To provide a preferred host cell and vector for purposes of the invention, 
30 the megalomicin biosynthetic genes are placed on a recombinant expression vector 
and transferred to the non-macrolide producing hosts Streptomyces lividam K4- 
1 14 and S. coelicolor CH999. Transformation of S. lividans K4-1 14 or 5. 
coelicolor CH999 with this expression vector results in a strain which produces 
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detectable amounts of megalomicin as determined by analysis of extracts by 
LC/MS. As noted above, the present invention also provides recombinant DNA 
compounds in which the encoded megalomicin module I KS domain is 
inactivated (the KS1° mutation). The introduction into Streptomyces Uvidans or S. 
5 coelicolor of a recombinant expression vector of the invention that encodes a 
megalomicin PKS with a KS1° domain produces a host cell useful for making 
polyketides by a process known as diketide feeding. The resulting host cells can be 
fed or supplied with N-acylcysteamine thioesters of precursor molecules to 
prepare megalomicin derivatives. Such cells of the invention are especially useful 

10 in the production of 13-substituted-6-deoxyerythronolide B compounds in 
recombinant host cells. Preferred compounds of the invention include those 
compounds in which the substituent at the 13-position is propyl, vinyl, propargyl, 
other lower alkyl, and substituted alkyl. In a preferred embodiment, the meg PKS 
is produced from a recombinant construct in which the megAIII gene has been 

1 5 altered to abolish the regions of identical coding sequence it otherwise shares with 
the megAI gene, or a hybrid PKS is employed in which the megAIII gene product 
has been replaced by the oleAIIl gene product. Recombinant oleAIII genes are 
described in, for example, PCT patent publication No. 00/026349 and U.S. patent 
application Serial No. 09/428,517, filed 28 Oct. 1999, both of which are 

20 incorporated herein by reference. 

The recombinant host cells of the invention can express all of the 
megalomicin biosynthetic genes or only a subset of the same. For example, if only 
the genes for the megalomicin PKS are expressed in a host cell that otherwise does 
not produce polyketide modifying enzymes that can act on the polyketide 

25 produced, then the host cell produces unmodified polyketides, called macrolide 
aglycones. Such macrolide aglycones can be hydroxylated and glycosylated by 
adding them to the fermentation of a strain such as, for example, Streptomyces 
antihioticus or Saccharopolyspora erythraea, that contains the requisite 
modification enzymes. 

30 There are a wide variety of diverse organisms that can modify macrolide 

aglycones to provide compounds with, or that can be readily modified to have, 
useful activities. For example, as shown in Figure 5, Saccharopolyspora erythraea 
can convert 6-dEB to a variety of useful compounds. The erythronolide 6-dEB is 
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converted by the eryF gene product to erythronolide B, which is, in turn, 
glycosylated by the eryBV gene product to obtain 3-O-mycarosylerythronolide B, 
which contains L-mycarose at C-3. The eryCJII gene product then converts this 
compound to erythromycin D by glycosylation with D-desosamine at C-5. 
5 Erythromycin D, therefore, differs from 6-dEB through glycosylation and by the 
addition of a hydroxyl group at C-6. Erythromycin D can be converted to 
erythromycin B in a reaction catalyzed by the eryG gene product by methylating 
the L-mycarose residue at C-3. Erythromcyin D is converted to erythromycin C by 
the addition of a hydroxyl group at C- 12 in a reaction catalyzed by the eryK gene 
10 product. Erythromycin A is obtained from erythromycin C by methylation of the 
mycarose residue in a reaction catalyzed by the eryG gene product. The 
unmodified megalomicin compounds provided by the present invention, such as, 
for example, the 6-dEB or 6-dEB analogs, produced in Streptomyces lividans, can 
be provided to cultures of S. erythraea and converted to the corresponding 
15 derivatives of erythromycins A, B, C, and D in accordance with the procedure 
provided in the examples below. To ensure that only the desired compound is 
produced, one can use an S. eiythraea eryA mutant that is unable to produce 6- 
dEB but can still cany out the desired conversions (Weber et aL, 1985, J. 
Bacteriol. 164{\): 425-433). Also, one can employ other mutant strains, such as 
20 eryB, eryC, eryG, and/or eryK mutants, or mutant strains having mutations in 

multiple genes, to accumulate a preferred compound. The conversion can also be 
carried out in large fermentors for commercial production. 

Moreover, there are other useful organisms that can be employed to 
hydroxylate and/or glycosylate the compounds of the invention. As described 
25 above, the organisms can be mutants unable to produce the polyketide normally 
. produced in that organism, the fermentation can be carried out on plates or in large 
fermentors, and the compounds produced can be chemically altered after 
fermentation. Thus, Streptomyces venezuelae, which produces picromycin, 
contains enzymes that can transfer a desosaminyl group to the C-5 hydroxyl and a 
30 hydroxyl group to the C-12 position. In addition, S. venezuelae contains a 

glucosylation activity that glucosylates the T -hydroxyl group of the desosamine 
sugar. This latter modification reduces antibiotic activity, but the glucosyl residue 
is removed by enzymatic action prior to release of the polyketide from the cell. 
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Another organism, S. narbonensis, contains the same modification enzymes as 5. 
venezuelae, except the C-12 hydroxylase. Thus, the present invention provides the 
compounds produced by hydroxylation and glycosylation of the macrolide 
aglycones of the invention by action of the enzymes endogenous to S. narbonensis 
5 and S. venezuelae. 

Other organisms suitable for making compounds of the invention include 
Micromonospora megalomicea (discussed above), Streptomyces antibioticus, S. 
fradiae, and S. thermotolerans. S. antibioticus produces oleandomycin and 
contains enzymes that hydroxy late the C-6 and C-12 positions, glycosylate the C-3 

10 hydroxyl with oleandrose and the C-5 hydroxy 1 with desosamine, and form an 
epoxide at C-8-C-8a. S, fradiae contains enzymes that glycosylate the C-5 
hydroxyl with mycaminose and then the 4 a -hydroxyl of mycaminose with 
mycarose, forming a disaccharide. S. thermotolerans contains the same activities 
as S. fradiae, as well as acylation activities. Thus, the present invention provides 

15 the compounds produced by hydroxylation and glycosylation of the macrolide 

aglycones of the invention by action of the enzymes endogenous to S. antibioticus, 
S. fradiae, and S. thermotolerans. 

The present invention also provides methods and genetic constructs for 
producing the glycosylated and/or hydroxylated compounds of the invention 

20 directly in the host cell of interest. Thus, the recombinant genes of the invention, 
which include recombinant megAI, megAII, and megA III genes with one or more 
deletions and/or insertions, including replacements of a megA gene fragment with 
a gene fragment from a heterologous PKS gene (as discussed in the next Section), 
can be included on expression vectors suitable for expression of the encoded gene 

25 products in Saccharopolyspora erythraea, Streptomyces antibioticus, S. 

venezuelae, S. narbonensis, Micromonospora megalomicea, S. fradiae, and S. 
thermotolerans . 

A number of erythromycin high-producing strains of Saccharopolyspora 
erythraea and Streptomyces fradiae have been developed, and in a preferred 
30 embodiment, the megalomicin PKS and/or other megalomicin biosynthetic genes 
are introduced into such strains (or erythromycin non-producing mutants thereof) 
to provide the corresponding modified megalomicin compounds in high yields. 
Those of skill in the art will appreciate that & erythraea contains the desosamine 
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and mycarose biosynthetic and transfer genes as well as DEBS, which, as noted 
above, makes the same macrolide aglycone, 6-dEB 3 as the rnegalomicin PKS. S, 
erythraea does not make megosamine or its corresponding transferase gene, and 
does not contain the acylation gene of Micromonospora megalomicea. Finally, the 
5 S. erythraea eryG gene product converts mycarose to cladinose, which does not 
occur in M. megalomicea. Thus, the present invention provides a wide variety of 
S. erythraea recombinant host cells, including, for example, those that contain: 

(i) wild-type erythromycin biosynthetic genes with recombinant 
megosamine biosynthetic and transfer genes, with and without rnegalomicin 

10 acylation genes; 

(ii) wild-type erythromycin biosynthetic genes except eryG, with 
recombinant megosamine biosynthetic and transfer genes, with and without 
rnegalomicin acylation genes; and 

(iii) as in (i) and (ii), except that the eryA genes are inactive or deleted and 
1 5 recombinant megA genes have been introduced. 

The invention provides other S. erythraea strains as well, including those 
in which any one or more of the erythromycin biosynthetic genes have been 
deleted or otherwise rendered inactive and in which at least one rnegalomicin 
biosynthetic gene has been introduced. 

20 For example, the present invention enables one to express the megosamine 

genes in a Saccharopolyspora erythraea eryG mutant in which the erythromycin C 
made by this mutant is converted to rnegalomicin A. Alternatively, one could use 
an erythromycin C high -producing strain of S. erythraea in biotransformation 
methods in which the erythromycin C is fed to a Streptomyces lividans strain 

25 carrying only the megosamine biosynthesis and glycosyl transferase genes. As 
another alternative, one could use a strain of S. lividans that carries suitable 
erythromycin production genes along with the daunosamine biosynthesis genes 
' l*\us geneX and geneY of Figure 5, or all of the megosamine biosynthesis genes, to 
produce rnegalomicin A. 

30 All or some of the rnegalomicin gene cluster can be easily cloned under 

control of a suitable promoter in pCK7 or pSET 1 52 either in one or two plasmids 
and introduced into the Saccharopolyspora erythraea eryG mutant. The actll- 
ORF4/tfc7/p system and the phiC3 Mint system in pSET function well in this 
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organism (see Rowe ei ah, 1 998, Gene, 2 1 6:21 5-23, incorporated herein by 
reference). Alternatively, the megosamine biosynthesis genes are introduced into 
Streptomyces lividans on the same plasmids and the production of megalomicin A 
or its precursor mediated by byconversion, done by feeding erythronolide B, 3- 
5 alpha-mycarosylerythronolide B, erythromycin D or erythromycin C to the S. 
lividans strain. 

Lack of adequate resistance to megalomicin A in S. erythraea or & 
lividans is not expected, because both organisms have MLS resistance genes 
(ermE and mgt/lrm^ respectively), which confer resistance to several 14-membered 

10 macrolides (see Cundliffe, 1989, Annu. Rev. Microbiol. 43:207-33; Jenkins and 
Cundliffe, 1991, Gene 705:55-62; and Cundliffe, 1992, Gene, 775:75-84, each of 
which is incorporated herein by reference). One can also readily determine the 
level of resistance of the S. erythraea eryG mutant and the S. lividans host cells to 
megalomicin A, both in plate tests and in liquid medium. One can repeat the 

15 bioconversion method using an eryG mutant of a high erythromycin A producing 
S. erythraea strain (or an eryB or eryC mutant, as necessary) to determine the level 
at which megalomicin A can be produced. Furthermore, if experience shows that 
high level megalomicin A production requires a higher level of resistance to this 
macrolide than present in S. erythraea or S. lividans, the necessary megalomicin 

20 self-resistance genes will be cloned from M. megalomicea and moved into either 
one of the heterologous hosts. This will be straightforward work since self- 
resistance genes are usually found in the cluster of macrolide biosynthesis genes 
and can be identified by their homology to known macrolide resistance genes 
and(or) by the resistance phenotype they impart to a strain that normally is 

25 sensitive. 

Alternatively, geneXand geneY (Figure 5) can be added to cassettes 
containing the relevant daunosamine (dnm) biosynthesis genes (Figure 5) to 
provide the ability to make TDP-megosamine in vivo and attach it to an 
erythromycin algycone. The TDP-daunosamine biosynthesis genes can be re- 
30 cloned from Streptomyces peucetius on two compatible and mutually selectable 
plasmids. When an S. lividans strain containing these two plasmids and the dnmS 
gene for TDP-daunosamine glycosyltransferase is grown in the presence of added 
epsilon-rhodomycinone, its glycoside with L-daunosamine, called rhodomycin D, 
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is produced in good yield. Thus, biocon version of one of the erythromycins to 
megalomicin A should be observed when geneX and gene Y are present. One can 
construct all five combination - the two N-d\ methyl transferase genes and the three 
glycosyltransferase genes - to discriminate geneX and geneY from those connected 
5 with mycarose and desosamine biosynthesis and attachment in the megalomicin 
pathway. 

Because the timing of megosamine addition is unknown, one can test 
erythronolide B, 3-alpha-rnycarosylerythronolide B, erythromycin D and 
erythromycin C as substrates provided to a strain that expresses the megosamine 

10 biosynthetic and transferase genes. There is need to test the C3 ? " and(or) C4 5 " 
acylated metabolites like megalomicin CI, because these metabolites are made 
from megalomicin A and not the converse, based on the precedents in the 
biosynthesis of tylosin (see Arisawa et ah, 1994, AppL Environ. Microbiol 60: 
2657-2661 ), carbomycin (see Epp et al., 1989, Gene 55:293-301), and 

1 5 midecamycin (see Hara and Hutchinson, 1 992, J. Bacieriol. 1 74 y 5 1 4 1 44). If 
C-6 glycosylation of erythronolide B or 3-alpha-mycarosylerythronolide B (Figure 
5) happens before addition of desosamine to C-5, then the erythromycin genes 
might not be able to complete formation of megalomicin A from some mono or 
diglycoside if the erythromycin glycosyltransferases cannot tolerate a C-6 

20 glycoside. Although unexpected, such an outcome could be circumvented in 
accordance with the methods of the invention by cloning further megalomicin 
biosynthesis genes into the appropriate S. erythraea background or into S. lividans 
— specifically, the necessary deoxysugar biosynthesis and attachment genes - to 
create a recombinant strain that produces megalomicin A. 

25 The acyltransferase gene that adds acetate or propionate to the C3"' or 

C4'" positions of mycarose in megalomicin B, CI and C2 (Figure 3) is contained 
within the cosmids of the invention and can be identified by scanning the sequence 
data for the megalomicin gene cluster to locate homologs of cdrE and mdmB or 
their acyA homologs from the tylosin producer. The carE and acyA genes govern 

30 C4'" acylation in the carbomycin and tylosin pathway, respectively. The 

megalomicin homolog has the equivalent function in megalomicin biosynthesis 
(but is specific for C3" 9 and C4 5 " acylation). The gene can be cloned under 
control of a suitable promoter and introduced into S. lividans to produce the 
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desired acyl derivative of megalomicin A. Alternatively, introduction of the carE 
gene can form megalomicin B. This gene can be cloned from the carbomycin, 
spiramycin or tylosin producers. 

If the amount of megalomicin produced by an S. erythraea or S, lividans or 
5 other recombinant host cell is less than desired, yield can be improved by 
optimizing the growth medium and fermentation conditions, by increasing 
expression of the gene(s) that appear to be rate limiting, based on the level of 
pathway intermediates that are accumulated by the recombinant strain constructed, 
and by reconstructing the ery y dnm, and megalomicin biosynthesis genes on 

10 vectors like pSET152 that can be integrated into the genome to provide a stabler 
recombinant strain for strain improvement. 

In another embodiment, the present invention provides recombinant 
vectors encoding one or more of the megosamine, desosamine, and mycarose 
biosynthetic and transfer genes and heterologous host cells comprising those 

15 vectors. In this embodiment of the invention, the heterologous host cell is typically 
a cell that is unable to produce the sugar and transfer it to a polyketide unless the 
vector of the invention is introduced. For example, neither Streptomyces lividans 
nor S. coelicolor is naturally capable of making megosamine, desosamine, or 
mycarose or transferring those moieties to a polyketide. However, the present 

20 invention provides recombinant Streptomyces lividans and S. coelicolor host cells 
that are capable of making megosamine, desosamine, and/or mycarose and 
transferring those moieties to a polyketide. 

Moreover, additional recombinant gene products can be expressed in the 
host cell to improve production of a desired polyketide. As but one non-limiting 

25 example, certain of the recombinant PKS proteins of the invention may produce a 
polyketide other than or in addition to the predicted polyketide, because the 
polyketide is cleaved from the PKS by the thioesterase (TE) domain in module 6 
prior to processing by other domains on the PKS, in particular, any KR, DH, 
and/or ER domains in module 6. The production of the predicted polyketide can 

30 be increased in such instances by deleting the TE domain coding sequences from 
the gene and, optionally, expressing the TE domain as a separate protein. See 
Gokhale et aL, Feb. 1999, "Mechanism and specificity of the terminal thioesterase 
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domain from the erythromycin polyketide synthase," Chem. & Biol. 6: 117-125, 
incorporated herein by reference. 

Thus, in one important aspect, the present invention provides methods, 
expression vectors, and recombinant host cells that enable the production of 
5 megalomicin and hydroxylated and glycosylated derivatives of megalomicin in 
heterologous host cells. The present invention also provides methods for making a 
wide variety of polyketides derived in part from the megalomicin PKS or other 
biosynthetic genes, as described in the following Section. 

10 Section VI: Hybrid PKS Genes 

The present invention provides recombinant DNA compounds encoding 
each of the domains of each of the modules of the megalomicin PKS as well as the 
other megalomicin biosynthetic enzymes. The availability of these compounds 
permits their use in recombinant procedures for production of desired portions of 

1 5 the megalomicin PKS fused to or expressed in conjunction with all or a portion of 
a heterologous PKS and, optionally, one or more polyketide modification 
enzymes. These compounds also permit the modification of polyketides with the 
various megalomicin modification enzymes. The resulting hybrid PKS can then be 
expressed in a host cell to produce a desired polyketide or modified form thereof 

20 Thus, in accordance with the methods of the invention, a portion of the 

megalomicin biosynthetic gene coding sequence that encodes a particular activity 
can be isolated and manipulated, for example, to replace the corresponding region 
in a different modular PKS gene or modification enzyme gene. In addition, coding 
sequences for individual proteins, modules, domains, and portions thereof of the 

25 megalomicin PKS can be ligated into suitable expression systems and used to 

produce the portion of the protein encoded. The resulting protein can be isolated 
arid purified or can may be employed in situ to effect polyketide synthesis. 
Depending on the host for the recombinant production of the domain, module, 
protein, or combination of proteins, suitable control sequences such as promoters, 
30 termination sequences, enhancers, and the like are ligated to the nucleotide 

sequence encoding the desired protein in the construction of the expression vector, 
as described above. 
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In one important embodiment, the invention thus provides hybrid PKS 
enzymes and the corresponding recombinant DNA compounds that encode those 
hybrid PKS enzymes. For purposes of the invention, a hybrid PKS is a 
recombinant PKS that comprises all or part of one or more extender modules, 

5 loading module, and/or thioesterase/cyclase domain of a first PKS and all or part 
of one or more extender modules, loading module, and/or thioesterase/cyclase 
domain of a second PKS. In one preferred embodiment, the first PKS is most but 
not all of the megalomicin PKS, and the second PKS is only a portion of a non- 
megalomicin PKS. An illustrative example of such a hybrid PKS includes a 

1 0 megalomicin PKS in which the megalomicin PKS loading module has been 

replaced with a loading module of another PKS. Another example of such a hybrid 
PKS is a megalomicin PKS in which the AT domain of extender module 3 is 
replaced with an AT domain that binds only malonyl CoA. In another preferred 
embodiment, the first PKS is most but not all of a non-megalomicin PKS, and the 

1 5 second PKS is only a portion of the megalomicin PKS. An illustrative example of 
such a hybrid PKS includes a rapamycin PKS in which an AT specific for malonyl 
CoA is replaced with the AT from the megalomicin PKS specific for 
methylmalonyl CoA. Other illustrative hybrid PKSs of the invention are described 
below. 

20 Those of skill in the art will recognize that all or part of either the first or 

second PKS in a hybrid PKS of the invention need not be isolated from a naturally 
occurring source. For example, only a small portion of an AT domain determines 
its specificity. See PCT patent application No. WO US99/15047, and Lau et al., 
inf?-a, incorporated herein by reference. The state of the art in DNA synthesis 

25 allows the artisan to construct de novo DNA compounds of size sufficient to 

* 

construct a useful portion of a PKS module or domain. Thus, the desired 
derivative coding sequences can be synthesized using standard solid phase 
synthesis methods such as those described by Jaye et al. 9 1984, J. BioL Chem. 259: 
6331, and instruments for automated synthesis are available commercially from, 
30 for example, Applied Biosystems, Inc. For purposes of the invention, such 
synthetic DNA compounds are deemed to be a portion of a PKS. 

With this general background regarding hybrid PKSs of the invention, one 
can better appreciate the benefit provided by the DNA compounds of the invention 
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that encode the individual domains, modules, and proteins that comprise the 
megalomicin PKS. As described above, the megalomicin PKS is comprised of a 
loading module, six extender modules composed of a KS, AT, ACP, and zero, 
one, two, or three KR, DH, and ER domains, and a thioesterase domain. The DNA 
5 compounds of the invention that encode these domains individually or in 

combination are useful in the construction of the hybrid PKS encoding DNA 
compounds of the invention. For example, a DNA compound of the invention that 
encodes an extender module or portion of an extender module is useful in the 
construction of a coding sequence that encodes a protein subcomponent of a PKS. 

1 0 The DNA compound of the invention that comprises a coding sequence of a PKS 
subunit protein is useful in the construction of an expression vector that drives 
expression of the subunit in a host cell that expresses the other subunits and so 
produces a functional PKS. 

The recombinant DNA compounds of the invention that encode the 

1 5 loading module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS 
loading module is inserted into a DNA compound that comprises the coding 
sequence for one or more heterologous PKS extender modules. The resulting 

20 construct, in which the coding sequence for the loading module of the 

heterologous PKS is replaced by that for the coding sequence of the megalomicin 
PKS loading module provides a novel PKS. Examples include the DEBS, 
rapamycin, FK-506, FK-520, rifamycin, and avermectin PKS coding sequences. In 
another embodiment, a DNA compound comprising a sequence that encodes the 

25 megalomicin PKS loading module is inserted into a DNA compound that 
comprises the coding sequence for the megalomicin PKS or a recombinant 
megalomicin PKS that produces a megalomicin derivative. 

In another embodiment, a portion of the loading module coding sequence 
is utilized in conjuction with a heterologous coding sequence. In this embodiment, 

30 the invention provides, for example, replacing the methylmalonyl CoA (propionyl) 
specific AT with a malonyl CoA (acetyl), ethylmalonyl CoA (butyryl), or other 
CoA specific AT. In addition, the AT and/or ACP can be replaced by another AT 
and/or another ACP or an inactivated KS, such as a KS Q , an AT, and/or another 
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ACP. The resulting heterologous loading module coding sequence can be utilized 
in conjunction with a coding sequence for a PKS that synthesizes megalomicin, a 
megalomicin derivative, or another polyketide. 

The recombinant DNA compounds of the invention that encode the first 
5 extender module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS first 
extender module is inserted into a DNA compound that comprises the coding 
sequence for a heterologous PKS. The resulting construct, in which the coding 

1 0 sequence for a module of the heterologous PKS is either replaced by that for the 
first extender module of the megalomicin PKS or the latter is merely added to 
coding sequences for modules of the heterologous PKS, provides a novel PKS 
coding sequence. In another embodiment, a DNA compound comprising a 
sequence that encodes the first extender module of the megalomicin PKS is 

1 5 inserted into a DNA compound that comprises coding sequences for the 
megalomicin PKS or a recombinant megalomicin PKS that produces a 
megalomicin derivative. 

In another embodiment, a portion or all of the first extender module coding 
sequence is utilized in conjunction with other PKS coding sequences to create a 

20 hybrid module. In this embodiment, the invention provides, for example, replacing 
the methylmalonyl CoA specific AT with a malonyl CoA, ethyimalonyl CoA, or 
2 -hydroxy malonyl CoA specific AT; deleting (which includes inactivating) the 
KR; inserting a DH or a DH and ER; and/or replacing the KR with another KR. a 
DH and KR, or a DH, KR, and ER. In addition, the KS and/or ACP can be 

25 replaced with another KS and/or ACP. In each of these replacements or insertions, 
the heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate 
from a coding sequence for another module of the megalomicin PKS, from a gene 
for a PKS that produces a polyketide other than megalomicin, or from chemical 
synthesis. The resulting heterologous first extender module coding sequence can 

30 be utilized in conjunction with a coding sequence for a PKS that synthesizes 
megalomicin, a megalomicin derivative, or another polyketide. 

Those of skill in the art will recognize, however, that deletion of the KR 
domain of extender module 1 or insertion of a DH domain or DH and KR domains 
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into extender module 1 will prevent the typical cyclization of the polyketide at the 
hydroxyl group created by the KR if such hybrid module is employed as a first 
extender module in a hybrid PKS or is otherwise involved in producing a portion 
of the polyketide at which cyclization is to occur. Such deletions or insertions can 

5 be useful, however, to create linear molecules or to induce cyclization at another 
site in the molecule. 

As noted above, the invention also provides recombinant PKSs and 
recombinant DNA compounds and vectors that encode such PKSs in which the 
KS domain of the first extender module has been inactivated. Such constructs are 

10 typically expressed in translational reading frame with the first two extender 
modules on a single protein, with the remaining modules and domains of a 
megalomicin, megalomicin derivative, or hybrid PKS expressed as one or more, 
typically two, proteins to form the multi-protein functional PKS, The utility of 
these constructs is that host cells expressing, or cell free extracts containing, the 

1 5 PKS encoded thereby can be fed or supplied with N-acylcysteamine thioesters of 
precursor molecules to prepare megalomicin derivative compounds. See U.S. 
patent application Serial No. 09/492,733, filed 27 Jan. 2000, and PCT publication 
Nos. WO 00/44717, 99/03986 and 97/02358, each of which is incorporated herein 
by reference. 

20 The recombinant DNA compounds of the invention that encode the second 

extender module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS 
second extender module is inserted into a DNA compound that comprises the 

25 coding sequence for a heterologous PKS. The resulting construct, in which the 
coding sequence for a module of the heterologous PKS is either replaced by that 
for the second extender module of the megalomicin PKS or the latter is merely 
added to coding sequences for the modules of the heterologous PKS, provides a 
novel PKS. In another embodiment, a DNA compound comprising a sequence that 

30 encodes the second extender module of the megalomicin PKS is inserted into a 

DNA compound that comprises the coding sequences for the megalomicin PKS or 
a recombinant megalomicin PKS that produces a megalomicin derivative. 
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In another embodiment, a portion or all of the second extender module 
coding sequence is utilized in conjunction with other PKS coding sequences to 
create a hybrid module. In this embodiment, the invention provides, for example, 
replacing the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl 
5 CoA, or 2-hydroxymalonyl CoA specific AT; deleting (or inactivating) the KR; 
replacing the KR with a KR, a KR and a DH, or a KR, DH, and ER; and/or 
inserting a DH or a DH and an ER. In addition, the KS and/or ACP can be 
replaced with another KS and/or ACP. In each of these replacements or insertions, 
the heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate 

1 0 from a coding sequence for another module of the megalomicin PKS, from a 

coding sequence for a PKS that produces a polyketide other than megalomicin, or 
from chemical synthesis. The resulting heterologous second extender module 
coding sequence can be utilized in conjunction with a coding sequence from a 
PKS that synthesizes megalomicin, a megalomicin derivative, or another 

15 polyketide. 

The recombinant DNA compounds of the invention that encode the third 
extender module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS third 

20 extender module is inserted into a DNA compound that comprises the coding 
sequence for a heterologous PKS. The resulting construct, in which the coding 
sequence for a module of the heterologous PKS is either replaced by that for the 
third extender module of the megalomicin PKS or the latter is merely added to 
coding sequences for the modules of the heterologous PKS, provides a novel PKS. 

25 In another embodiment, a DNA compound comprising a sequence that encodes 
the third extender module of the megalomicin PKS is inserted into a DNA 
compound that comprises coding sequences for the megalomicin PKS or a 
recombinant megalomicin PKS that produces a megalomicin derivative. 

In another embodiment, a portion or all of the third extender module 

30 coding sequence is utilized in conjunction with other PKS coding sequences to 
create a hybrid module. In this embodiment, the invention provides, for example, 
replacing the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl 
CoA, or 2-hydroxymalonyl CoA specific AT; deleting the inactive KR; and/or 
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replacing the KR with an active KR, or a ICR and DH, or a KR, DH, and ER. In 
addition, the KS and/or ACP can be replaced with another KS and/or ACP. In 
each of these replacements or insertions, the heterologous KS, AT, DH, KR, ER, 
or ACP coding sequence can originate from a coding sequence for another module 
5 of the megalomicin PKS, from a gene for a PKS that produces a polyketide other 
than megalomicin, or from chemical synthesis. The resulting heterologous third 
extender module coding sequence can be utilized in conjunction with a coding 
sequence for a PKS that synthesizes megalomicin, a megalomicin derivative, or 
another polyketide. 

10 The recombinant DNA compounds of the invention that encode the fourth 

extender module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS fourth 
extender module is inserted into a DNA compound that comprises the coding „ 

15 sequence for a heterologous PKS. The resulting construct, in which the coding 
sequence for a module of the heterologous PKS is either replaced by that for the 
fourth extender module of the megalomicin PKS or the latter is merely added to 
coding sequences for the modules of the heterologous PKS, provides a novel PKS. 
In another embodiment, a DNA compound comprising a sequence that encodes 

20 the fourth extender module of the megalomicin PKS is inserted into a DNA 
compound that comprises coding sequences for the megalomicin PKS or a 
recombinant megalomicin PKS that produces a megalomicin derivative. 

In another embodiment, a portion of the fourth extender module coding 
sequence is utilized in conjunction with other PKS coding sequences to create a 

25 hybrid module. In this embodiment, the invention provides, for example, replacing 
the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl CoA, or 
2-hydroxymalonyl CoA specific AT; deleting or inactivating any one, two, or all 
three of the ER, DH, and KR; and/or replacing any one, two, or all three o f the ER, 
DH, and KR with either a KR, a DH and KR, or a KR, DH, and ER. In addition, 

30 the KS and/or ACP can be replaced with another KS and/or ACP. In each of these 
replacements or insertions, the heterologous KS, AT, DH, KR, ER, or ACP coding 
sequence can originate from a coding sequence for another module of the 
megalomicin PKS (except for the DH and ER domains), from a coding sequence 
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for a PICS that produces a polyketide other than megalomicin, or from chemical 
synthesis. The resulting heterologous fourth extender module coding sequence can 
be utilized in conjunction with a coding sequence for a PICS that synthesizes 
megalomicin, a megalomicin derivative, or another polyketide. 
5 The recombinant DN A compounds of the invention that encode the fifth 

extender module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS fifth 
extender module is inserted into a DNA compound that comprises the coding 

10 sequence for a heterologous PKS. The resulting construct, in which the coding 
sequence for a module of the heterologous PKS is either replaced by that for the 
fifth extender module of the megalomicin PKS or the latter is merely added to 
coding sequences for the modules of the heterologous PKS, provides a novel PKS. 
In another embodiment, a DNA compound comprising a sequence that encodes 

1 5 the fifth extender module of the megalomicin PKS is inserted into a DNA 

compound that comprises the coding sequence for the megalomicin PKS or a 
recombinant megalomicin PKS that produces a megalomicin derivative. 

In another embodiment, a portion or all of the fifth extender module 
coding sequence is utilized in conjunction with other PKS coding sequences to 

20 create a hybrid module. In this embodiment, the invention provides, for example, 
replacing the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl 
CoA, or 2-hydroxymalonyl CoA specific AT; deleting (or inactivating) the KR; 
inserting a DH or a DH and ER; and/or replacing the KR with another KR, a DH 
and KR, or a DH, KR, and ER. In addition, the KS and/or ACP can be replaced 

25 with another KS and/or ACP. In each of these replacements or insertions, the 

heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate from a 
coding sequence for another module of the megalomicin PKS, from a coding 
sequence for a PKS that produces a polyketide other than megalomicin, or from 
chemical synthesis. The resulting heterologous fifth extender module coding 

30 sequence can be utilized in conjunction with a coding sequence for a PKS that 
synthesizes megalomicin, a megalomicin derivative, or another polyketide. 

The recombinant DNA compounds of the invention that encode the sixth 
extender module of the megalomicin PKS and the corresponding polypeptides 
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encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megaiomicin PKS sixth 
extender module is inserted into a DNA compound that comprises the coding 
sequence for a heterologous PKS. The resulting construct, in which the coding 
5 sequence for a module of the heterologous PKS is either replaced by that for the 
sixth extender module of the megaiomicin PKS or the latter is merely added to 
coding sequences for the modules of the heterologous PKS, provides a novel PKS. 
In another embodiment, a DNA compound comprising a sequence that encodes 
the sixth extender module of the megaiomicin PKS is inserted into a DNA 

10 compound that comprises the coding sequences for the megaiomicin PKS or a 
recombinant megaiomicin PKS that produces a megaiomicin derivative. 

In another embodiment, a portion or all of the sixth extender module 
coding sequence is utilized in conjunction with other PKS coding sequences to 
create a hybrid module. In this embodiment, the invention provides, for example, 

1 5 replacing the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl 
CoA, or 2-hydroxymalonyl CoA specific AT; deleting or inactivating the KR or 
replacing the KR with another KR, a KR and DH, or a KR, DH, and an ER; and/or 
inserting a DH or a DH and ER. In addition, the KS and/or ACP can be replaced 
with another KS and/or ACP. In each of these replacements or insertions, the 

20 heterologous KS, AT, DH. KR, ER, or ACP coding sequence can originate from a 
coding sequence for another module of the megaiomicin PKS, from a coding 
sequence for a PKS that produces a polyketide other than megaiomicin, or from 
chemical synthesis. The resulting heterologous sixth extender module coding 
sequence can be utilized in conjunction with a coding sequence for a PKS that 

25 synthesizes megaiomicin, a megaiomicin derivative, or another polyketide. 

The sixth extender module of the megaiomicin PKS is followed by a 
thioesterase domain. This domain is important in the cyclization of the polyketide 
and its cleavage from the PKS. The present invention provides recombinant DNA 
compounds that encode hybrid PKS enzymes in which the megaiomicin PKS is 

30 fused to a heterologous thioesterase or a heterologous PKS is fused to the 

megaiomicin PKS thioesterase. Thus, for example, a thioesterase domain coding 
sequence from another PKS gene can be inserted at the end of the sixth (or other 
final) extender module coding sequence in recombinant DNA compounds of the 
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invention or the megalomicin PKS thioesterase can be similarly fused to a 
heterologous PKS. Recombinant DNA compounds encoding this thioesterase 
domain are useful in constructing DNA compounds that encode the megalomicin 
PKS, a PKS that produces a megalomicin derivative, and a PKS that produces a 
5 polyketide other than megalomicin or a megalomicin derivative. 

Thus, the hybrid modules of the invention are incorporated into a PKS to 
provide a hybrid PKS of the invention. A hybrid PKS of the invention can result 
not only: 

(i) from fusions of heterologous domain (where heterologous means the 
1 0 domains in a module are derived from at least two different naturally occurring 

+ 

modules) coding sequences to produce a hybrid module coding sequence 
contained in a PKS gene whose product is incorporated into a PKS, 
but also: 

(it) from fusions of heterologous modules (where heterologous module 
1 5 means two modules are adjacent to one another that are not adjacent to one 
another in naturally occurring PKS enzymes) coding sequences to produce a 
hybrid coding sequence contained in a PKS gene whose product is incorporated 
into a PKS, 

(iii) from expression of one or more megalomicin PKS genes with one or 
20 more non-megalomicin PKS genes, including both naturally occurring and 

recombinant non-megalomicin PKS genes, and 

(iv) from combinations of the foregoing. 

Various hybrid PKSs of the invention illustrating these various alternatives are 
described herein. 

25 An example of a hybrid PKS comprising fused modules results from 

fusion of the loading module of either the DEBS PKS or the narbonolide PKS (see 
PCT patent application No. US99/1 1814, incorporated herein by reference) with 
extender modules 1 and 2 of the megalomicin PKS to produce a hybrid megAI 
gene. Co-expression of either one of these two hybrid megAI genes with the 

30 megAI/ and megAHI genes in suitable host cells, such as Streptomcyes lividans^ 
results in expression of a hybrid PKS of the invention that produces 6- 
deoxyerythronolide B (the polyketide product of the natural megA genes) in 
recombinant host cells. Co-expression of either one of these two hybrid megAI 
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genes with the eryAII and eryA ///.genes similarly results in the production of 6- 
dEB, while co-expression with the analogous narbonolide PKS genes, picAII, 
picAHI and picAIV, results in the production of 3-deoxy-3-oxo-6-dEB (3-keto-6- 
dEB), useful in the production of ketolides, compounds with potent anti-bacterial 
5 activity. 

Another example of a hybrid PKS comprising a hybrid module is prepared 

m 

by co-expressing the megAI and megAII genes with a megAIII hybrid gene 
encoding extender module 5 and the KS and AT of extender module 6 of the 
megalomicin PKS fused to the ACP of module 6 and the TE of the narbonolide 
10 PKS. The resulting hybrid PKS of the invention produces 3-keto-6-dEB. This 

compound can also be prepared by a recombinant megalomicin derivative PKS of 
the invention in which the KR domain of module 6 of the megalomicin PKS has 
been deleted. Moreover, the invention provides hybrid PKSs in which not only the 
above changes have been made but also the AT domain of module 6 has been 
1 5 replaced with a malonyl-specific AT. These hybrid PKSs produce 2-desmethyl-3- 
deoxy-3-oxo-6-dEB, a useful intermediate in the preparation of 2-desmethyl 
ketolides, compounds with potent antibiotic activity. 

Another illustrative example of a hybrid PKS includes the hybrid PKS of 
the invention resulting only from the latter change in the hybrid PKS just 
20 described. Thus, co-expression of the megAI and megAII genes with a hybrid 
megAIII gene in which the AT domain of module 6 has been replaced by a 
malonyl-specific AT results in the expression of a hybrid PKS that produces 2- 
desmethyl-6-dEB in recombinant host cells. This compound is a useful 
intermediate for making 2-desmethyl erythromycins in recombinant host cells of 
25 the invention, as well as for making 2-desmethyl semi -synthetic ketolides. 

While many of the hybrid PKSs described above are composed primarily 
of megalomicin PKS proteins, those of skill in the art recognize that the present 
invention provides many different hybrid PKSs, including those composed of only 
a small portion of the megalomicin PKS. For example, the present invention 
30 provides a hybrid PKS in which a hybrid eryAI gene that encodes the megalomicin 
PKS loading module fused to extender modules 1 and 2 of DEBS is coexpressed 
with the eryAII and eryA III genes. The resulting hybrid PKS produces 6-dEB, the 
product of the native DEBS. When the construct is expressed in 
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Saccharopolyspora erythraea host cells (either via chromosomal integration in the 
chromosome or via a vector that encodes the hybrid PKS), the resulting 
recombinant host cell of the invention produces erythromycins. Another 
illustrative example is the hybrid PKS of the invention composed of the megAI 
5 and eryAII and eryAIII gene products. This construct is also useful in expressing 
erythromycins in Saccharopolyspora erythraea host cells. In a preferred 
embodiment, the £ erythraea host cells are eryAJ mutants that do not produce 6- 
deoxyerythronolide B. 

Another example is the hybrid PKS of the invention composed of the 

10 products of the picAI and picAIJ genes (the two proteins that comprise the loading 
module and extender modules 1 - 4. inclusive, of the narbonolide PKS) and the 
megAIII gene. The resulting hybrid PKS produces the macrolide aglycone 3- 
hydroxy-narbonolide in Streptomyces lividaris host cells and the corresponding 
erythromycins in Saccharopolyspora erythraea host cells. 

1 5 Each of the foregoing hybrid PKS enzymes of the invention, and the hybrid 

PKS enzymes of the invention generally, can be expressed in a host cell that also 
expresses a functional oleP gene product. The oleP gene encodes an oleandomycin 
modification enzyme, and expression of the gene together with a hybrid PKS of 
the invention provides the compounds of the invention in which a C-8 hydroxy!, a 

20 C-8a or C-8-C-8a epoxide is present. 

Recombinant methods for manipulating modular PKS genes to make 
hybrid PKS enzymes are described in U.S. Patent Nos. 5,672 5 49 1 ; 5,843,7 1 8; 
5,830,750; and 5,712,146; and in PCT publication Nos. 98/49315 and 97/02358, 
each of which is incorporated herein by reference. A number of genetic 

25 engineering strategies have been used with DEBS to demonstrate that the 

structures of polyketides can be manipulated to produce novel natural products, 
primarily analogs of the erythromycins (see the patent publications referenced 
supra and Hutchinson, 1998, Curr Opin Microbiol. 7:319-329, and Baltz, 1998, 
Trends Microbiol. 5:76-83, incorporated herein by reference). Because of the 

30 similar activity of the megalomicin PKS and DEBS (both PKS enzymes produce 
the macrolide aglycone 6-dEB), these methods can be readily applied to the 
recombinant megalomicin PKS genes of the invention. 
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These techniques include: (i) deletion or insertion of modules to control 
chain length, (ii) inactivation of reduction/dehydration domains to bypass beta- 
carbon processing steps, (iii) substitution of AT domains to alter starter and 
extender units, (iv) addition of reduction/dehydration domains to introduce 
5 catalytic activities., and (v) substitution of ketoreductase KR domains to control 
hydroxyl stereochemistry. In addition, engineered blocked mutants of DEBS have 
been used for precursor directed biosynthesis of analogs that incorporate 
synthetically derived starter units. For example, more than 100 novel polyketides 
were produced by engineering single and combinatorial changes in multiple 

10 modules of DEBS. Hybrid PKS enzymes based on DEBS with up to three catalytic 
domain substitutions were constructed by cassette mutagenesis, in which various 
DEBS domains were replaced with domains from the rapamycin PKS (see 
Schweke et a/., 1995, Proc. Nat. Acad. Sci. USA 92, 7839-7843, incorporated 
herein by reference) or one more of the DEBS KR domains was deleted. 

1 5 Functional single domain replacements or deletions were combined to generate 
DEBS enzymes with double and triple catalytic domain substitutions (see 
McDaniel et al, 1999, Proc. Nat Acad. Set USA 96, 1846-1851, incorporated 
herein by reference). By providing the analogous megalomicin/rapamycin hybrid 
PKS enzymes, the present invention provides alternative means to make these 

20 polyketides. 

Methods for generating libraries of polyketides have been greatly improved 
by cloning PKS genes as a set of three or more mutually selectable plasmids, each 
carrying a different wild-type or mutant PKS gene, then introducing all possible 
combinations of the plasmids with wild-type, mutant, and hybrid PKS coding 

25 sequences into the same host (see U.S. patent application Serial No. 60/129,731, 
filed 16 Apr. 1999, and PCT Pub. No. 98/27203, each of which is incorporated 
herein by reference). This method can also incorporate the use of a KS1° mutant, 
which by mutational biosynthesis can produce polyketides made from diketide 
starter units (see Jacobsen et aL 9 1997, Science 277, 367-369, incorporated herein 

30 by reference), as well as the use of a truncated gene that leads to 12-membered 
macrolides or an elongated gene that leads to 1 6-membered ketolides. Moreover, 
by utilizing in addition one or more vectors that encode glycosyl biosynthesis and 
transfer genes, such as those of the present invention for megosamine, 
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desosamine, oleandrose, cladinose, and/or mycarose (in any combination), a large 
collection of glycosylated polyketides can be prepared. 

The following Table lists references describing illustrative PKS genes and 
corresponding enzymes that can be utilized in the construction of the recombinant 
5 hybrid PKSs and the corresponding DNA compounds that encode them of the 

invention. Also presented are various references describing tailoring enzymes and 
corresponding genes that can be employed in accordance with the methods of the 
invention. 
Avermectin 
10 U.S. Pat. No. 5,252,474 to Merck. 

MacNeil et aL, 1993, Industrial Microorganisms: Basic and Applied 
Molecular Genetics , Bahz, Hegeman, & Skatrud, eds. (ASM), pp. 245-256, A 
Comparison of the Genes Encoding the Polyketide Synthases for Avermectin, 
Erythromycin, and Nemadectin. 
15 MacNeil et aL, 1992 r Gene 115: 1 19-125, Complex Organization of the 

Streptomyces avermitilis genes encoding the avermectin polyketide synthase. 
Candicidin (FR008) 

Hu et aL, 1994, Mol Microbiol. 14: 163-172. 
Epothilone 

20 PCT Pub. No. 00/03 1 247 to Kosan. 

Erythromycin 

PCT Pub. No. 93/1 3663 to Abbott. 
US Pat. No. 5,824,5 1 3 to Abbott. 
Donadio et aL, 1 99 1 , Science 252:675-9. 
25 Cortes et aL, 8 Nov. 1 990, Nature 348:176-8, An unusually large 

multifunctional polypeptide in the erythromycin producing polyketide synthase of 
Saccharopolyspora erythraea. 
Glycosylation Enzymes 
PCT Pub. No. 97/23630 to Abbott. 
30 FK-506 

Motamedi et aL, ] 998, The biosynthetic gene cluster for the macrolactone 
ring of the immunosuppressant FK506, Euk J. biochem, 256: 528-534. 
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Motamedi et aL, 1997., Structural organization of a multifunctional 
polyketide synthase involved in the biosynthesis of the raacrolide 
immunosuppressant FK506, Eur. J. Biochem. 244: 74-80. 

Me thy 1 transferase 

5 US 5,264,355, issued 23 Nov. 1993, Methylating enzyme from 

Streptomyces MA6858. 31-O-desmethyl-FK506 methyltransferase. 

Motamedi et aL, 1996, Characterization of methyltransferase and 
hydroxylase genes involved in the biosynthesis of the immunosuppressants FK506 
and FK520, J. Bacterial 178: 5243-5248. 
10 FK-520 

PCT Pub. No. 00/20601 to Kosan. 

See also Nielsen el aL, 1991, Biochem. 30:5789-96 (enzymology of 
pipecolate incorporation). 
Lovastatin 

15 U.S. Pat. No. 5,744,350 to Merck. 

Narbomycin (and Picromycin) 

PCT Pub. No. WO US99/61599 to Kosan. 
Nemadectin 

MacNeil et al. 9 1993, supra. 
20 Niddamycin 

Kakavas et aL, 1997, Identification and characterization of the niddamycin 
polyketide synthase genes from Streptomyces caelestis, J. Bacterioi 179: 7515- 
7522. 

Oleandomycin 

25 Swan et aL, 1994, Characterization of a Streptomyces antibioticus gene 

. encoding a type I polyketide synthase which has an unusual coding sequence, Moi 
Gen. Genet. 242: 358-362. 

PCT Pub. No. 00/026349 to Kosan. 

Olano et aL, 1998, Analysis of a Streptomyces antibioticus chromosomal 
30 region involved in oleandomycin biosynthesis, which encodes two 

glycosyltransferases responsible for glycosylation of the macrolactone ring, Mol. 

Gen. Genet. 259(3): 299-308. 

Platenolide 
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EP Pub. No. 791,656 to Lilly. 
Rapamycin 

Schwecke et al^ Aug. 1 995, The biosynthetic gene cluster for the 
polyketide rapamycin, Proc. Natl. Acad. ScL USA 92:7839-7843. 
5 Aparicio et aL, 1996, Organization of the biosynthetic gene cluster for 

rapamycin in Streptomyces hygroscopicus: analysis of the enzymatic domains in 
the modular polyketide synthase, Gene 169: 9-16. 
Rifamycin 

August et aL, 13 Feb. 1998, Biosynthesis of the ansamycin antibiotic 
10 rifamycin: deductions from the molecular analysis of the /-(^biosynthetic gene 
cluster of Amycolatopsis mediterranei S669, Chemistry & Biology, 5(2): 69-79. 
Soraphen 

U.S. Pat. No. 5,716,849 to Novartis. 

Schupp et al, 1995, J. Bacteriology 1 77: 3673-3679. A Sorangium 
1 5 cellulosum (Myxobacteriurn) Gene Cluster for the Biosynthesis of the Macrolide 
Antibiotic Soraphen A: Cloning, Characterization, and Homology to Polyketide 
Synthase Genes from Actinomycetes. 
Spiramycin 

U.S. Pat. No/5,098,837 to Lilly. 
20 Activator Gene 

U.S. Pat. No. 5,514,544 to Lilly. 
Tylosin 

EP Pub. No. 791,655 to Lilly. 

Kuhstoss et ah, 1996, Gene 753:231-6., Production of a novel polyketide 

25 through the construction of a hybrid polyketide synthase. 

U.S. Pat. No. 5,876,991 to Lilly. 
Tailoring enzymes 

Merson-Davies and Cundliffe, 1994, Mol Microbiol. J 3: 349-355. 
Analysis of five tylosin biosynthetic genes from the tylBA region of the 
30 Streptomyces fradiae genome. 

As the above Table illustrates, there are a wide variety of PKS genes that serve as 
readily available sources of DNA and sequence information for use in constructing 
the hybrid PKS-encoding DNA compounds of the invention. 
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In constructing hybrid PKSs of the invention, certain general methods may 
be helpful. For example, it is often beneficial to retain the framework of the 
module to be altered to make the hybrid PKS. Thus, if one desires to add DH and 
ER functionalities to a module, it is often preferred to replace the KR domain of 
5 the original module with a cognate KR, DH, and ER domain-containing segment 
from another module, instead of merely inserting DH and ER domains. One can 
alter the stereochemical specificity of a module by replacement of the KS domain 
with a KS domain from a module that specifies a different stereochemistry. See 
Lau et aL, 1999, "Dissecting the role of acyltransferase domains of modular 
10 polyketide synthases in the choice and stereochemical fate of extender units'* 

Biochemistry 38(5): 1643- 1 65 1, incorporated herein by reference. One can alter the 
specificity of an AT domain by changing only a small segment of the domain. See 
Lau et aL, supra. One can also take advantage of known linker regions in PKS 
proteins to link modules from two different PKSs to create a hybrid PKS. See 
15 Gokhale et aL, 1 6 Apr. 1999, Dissecting and Exploiting Intermodular 

Communication in Polyketide Synthases", Science 284: 482-485, incorporated 
herein by reference. 

The hybrid PKS-encoding DN A compounds of the invention can be and 
often are hybrids of more than two PKS genes. Even where only two genes are 
20 used, there are often two or more modules in the hybrid gene in which all or part 
of the module is derived from a second (or third) PKS gene. Thus, as one 
illustrative example, the invention provides a hybrid PKS that contains the 
naturally occurring loading module and thioesterase domain as well as extender 
modules one, two, four, and six of the megalomicin PKS and further contains 
25 hybrid or heterologous extender modules three and five. Hybrid or heterologous 
extender modules three and five contain AT domains specific for malonyl CoA 
and derived from, for example, the rapamycin PKS genes. 

The invention also provides libraries of PKS genes, PKS proteins, and 
ultimately, of polyketides, that are constructed by generating modifications in the 
30 megalomicin PKS so that the protein complexes produced have altered activities 
in one or more respects and thus produce polyketides other than the natural 
product of the PKS. Novel polyketides may thus be prepared, or polyketides in 
general prepared more readily, using this method. By providing a large number of 
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different genes or gene clusters derived from a naturally occurring PKS gene 
cluster, each of which has been modified in a different way from the native cluster, 
an effectively combinatorial library of polyketides can be produced as a result of 
the multiple variations in these activities. As will be further described below, the 
5 metes and bounds of this embodiment of the invention can be described on the 
polyketide, protein, and the encoding nucleotide sequence levels. 

As described above, a modular PKS "derived from" the megalomicin or 
other naturally occurring PKS includes a modular PKS (or its corresponding 
encoding gene(s)) that retains the scaffolding of the utilized portion of the 

1 0 naturally occurring gene. Not all modules need be included in the constructs; 

however, the constructs can also comprise more than six modules. On the constant 
scaffold, at least one enzymatic activity is mutated, deleted, replaced, or inserted 
so as to alter the activity of the resulting PKS relative to the original (native) PKS. 
Alteration results when these activities are deleted or are replaced by a different 

1 5 version of the activity, or simply mutated in such a way that a polyketide other 
than the natural product results from these collective activities. This occurs 
because there has been a resulting alteration of the starter unit and/or extender 
unit, stereochemistry, chain length or cyclization, and/or reductive or dehydration 
cycle outcome at a corresponding position in the product polyketide. Where a 

20 deleted activity is replaced, the origin of the replacement activity may come from a 
corresponding activity in a different naturally occurring PKS or from a different 
region of the megalomicin PKS. Any or all of the megalomicin PKS genes may be 
included in the derivative or portions of any of these may be included, but the 
scaffolding of a functional PKS protein is retained in whatever derivative is 

25 constructed. The derivative preferably contains a thioesterase activity from the 
megalomicin or another PKS. 

Thus, a PKS derived from the megalomicin PKS includes a PKS that 
contains the scaffolding of all or a portion of the megalomicin PKS. The derived 
PKS also contains at least two extender modules that are functional, preferably 

30 three extender modules, and more preferably four or more extender .modules, and 
most preferably six extender modules. The derived PKS also contains mutations, 
deletions, insertions, or replacements of one or more of the activities of the 
functional modules of the megalomicin PKS so that the nature of the resulting 
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polyketide is altered at both the protein and DN A sequence levels. Particular 
preferred embodiments include those wherein a KS, AT, or ACP domain has been 
deleted or replaced by a version of the activity from a different PICS or from 
another location within the same PKS. Also preferred are derivatives where at 
5 least one non-condensation cycle enzymatic activity (KR, DH, or ER) has been 
deleted or added or wherein any of these activities has been mutated so as to 
change the structure of the polyketide synthesized by the PKS. 

Conversely, also included within the definition of a PKS derived from the 
megalomicin PKS are functional non-megalomicin PKS modules or their 
10 encoding genes wherein at least one domain or coding sequence therefor of a 
megalomicin PKS module has been inserted- Exemplary is the use of the 
megalomicin AT for extender module 2, which accepts a methylmalonyl CoA 
extender unit rather than malonyl CoA, to replace a malonyl specific AT in 
another PKS. Other examples include insertion of portions of non-condensation 
15 cycle enzymatic activities or other regions of megalomicin synthase activity into a 
heterologous PKS at both the DNA and protein levels. 

Thus, there are at least five degrees of freedom for constructing a hybrid 
PKS in terms of the polyketide that will be produced. First, the polyketide chain 
length is determined by the number of extender modules in the PKS, and the 
20 present invention includes hybrid PKSs that contain 6, as wells as fewer or more 
than 6, extender modules. Second, the nature of the carbon skeleton of the PKS is 
determined by the specificities of the acyl transferases that determine the nature of 
the extender units at each position, e.g., malonyl, methylmalonyl, ethylmalonyl, or 
other substituted malonyl. Third, the loading module specificity also has an effect 
25 on the resulting carbon skeleton of the polyketide. The loading module may use a 
different starter unit, such as acetyl, butyryl, and the like. As noted above, another 
method for varying loading module speci ficity involves inactivating the KS 
activity in extender module 1 (KS1) and providing alternative substrates, called 
diketides, that are chemically synthesized analogs of extender module 1 diketide 
30 products, for extender module 2. This approach was illustrated in PCT publication 
Nos. 97/02358 and 99/03986, incorporated herein by reference, wherein the KS1 
activity was inactivated through mutation. Fourth, the oxidation state at various 
positions of the polyketide will be determined by the dehydratase and reductase 
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portions of the modules. This will determine the presence and location of ketone 
and alcohol moieties and C-C double bonds or C-C single bonds in the polyketide. 

Finally, the stereochemistry of the resulting polyketide is a function of 
three aspects of the synthase. The first aspect is related to the AT/KS specificity 
5 associated with substituted malonyls as extender units, which affects 

stereochemistry only when the reductive cycle is missing or when it contains only 
a ketoreductase, as the dehydratase would abolish chirality. Second, the specificity 
of the ketoreductase may determine the chirality of any beta-OH. Finally, the 
enoy I reductase specificity for substituted malonyls as extender units may influence 

10 the stereochemistry when there is a complete KR/DH/ER available. 

Thus, the modular PKS systems generally and the megalomicin PKS 
system particularly permit a wide range of polyketides to be synthesized. As 
compared to the aromatic PKS systems, the modular PKS systems accept a wider 
range of starter units, including aliphatic monomers (acetyl, propionyl, butyryl, 

15 isovaleryl, and the like.), aromatics (aminohydroxybenzoyl), alicyclics 

(cyclohexanoyl), and heterocyclics (thiazolyl). Certain modular PKSs have relaxed 
specificity for their starter units (Kao et al z 1994, Science, supra). Modular PKSs 
also exhibit considerable variety with regard to the choice of extender units in 
each condensation cycle. The degree of beta-ketoreduction following a 

20 condensation reaction can be altered by genetic manipulation (Donadio el al^ 

1991, Science, supra; Donadio et al., 1993, Proc. Natl Acad ScL USA 90: 7119- 
7123). Likewise, the size of the polyketide product can be varied by designing 
mutants with the appropriate number of modules (Kao et ai 7 1994, J. Am. Chem. 
Soc. 775:1 1612-1 1613). Lastly, modular PKS enzymes are particularly well 

25 known for generating an impressive range of asymmetric centers in their products 
in a highly controlled manner. The polyketides, antibiotics, and other compounds 
produced by the methods of the invention are typically single stereoisomeric 
forms. Although the compounds of the invention can occur as mixtures of 
stereoisomers, it may be beneficial in some instances to generate individual 

30 stereoisomers. Thus, the combinatorial potential within modular PKS pathways 
based on any naturally occurring modular, such as the megalomicin, PKS scaffold 
is virtually unlimited. 
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While hybrid PKSs are most often produced by "mixing and matching" 
portions of PKS coding sequences, mutations in DNA encoding a PKS can also be 
used to introduce, alter, or delete an activity in the encoded polypeptide. Mutations 
can be made to the native sequences using conventional techniques. The substrates 
5 for mutation can be an entire cluster of genes or only one or two of them; the 
substrate for mutation may also be portions of one or more of these genes. 
Techniques for mutation include preparing synthetic oligonucleotides including 
the mutations and inserting the mutated sequence into the gene encoding a PKS 
subunit using restriction endonuclease digestion. See, e.g. 3 Kunkel, 1985, Proc. 

10 Natl Acad. Scl USA 82: 448; Geisselsoder et al, 1987, BioTechniques 5:786. 

Alternatively, the mutations can be effected using a mismatched primer (generally 
10-20 nucleotides in length) that hybridizes to the native nucleotide sequence, at a 
temperature below the melting temperature of the mismatched duplex. The primer 
can be made specific by keeping primer length and base composition within 

15 relatively narrow limits and by keeping the mutant base centrally located. See 

Zoller and Smith, 1983, Methods EnzymoL 700:468. Primer extension is effected 
using DNA polymerase, the product cloned, and clones containing the mutated 
DNA, derived by segregation of the primer extended strand, selected. 
Identification can be accomplished using the mutant primer as a hybridization 

20 probe. The technique is also applicable for generating multiple point mutations. 
See, e.g., Dalbie-McFarland et al, 1982, Proc. Natl Acad. ScL USA 79: 6409. 
PCR mutagenesis can also be used to effect the desired mutations. 

Random mutagenesis of selected portions of the nucleotide sequences 
encoding enzymatic activities can also be accomplished by several different 

25 techniques known in the art, e.g., by inserting an oligonucleotide linker randomly 
into a plasrnid, by irradiation with X-rays or ultraviolet light, by incorporating 
incorrect nucleotides during in vitro DNA synthesis, by error-prone PCR 
mutagenesis, by preparing synthetic mutants, or by damaging plasrnid DNA in 
vitro with chemicals, in accordance with the methods of the present invention. 

30 Chemical mutagens include, for example, sodium bisulfite, nitrous acid, 

nitrosoguanidine, hydroxylamine, agents which damage or remove bases thereby 
preventing normal base-pairing such as hydrazine or formic acid, analogues of 
nucleotide precursors such as 5-bromouracil, 2-aminopurine, or acridine 
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intercalating agents such as proflavine, acriflavine, quinacrine, and the like. 
Generally, plasmid DNA or DNA fragments are treated with chemical mutagens, 
transformed into E. coll and propagated as a pool or library of mutant plasmids. 

In constructing a hybrid PKS of the invention, regions encoding enzymatic 
5 activity, i.e., regions encoding corresponding activities from different PKS 
synthases or from different locations in the same PKS, can be recovered, for 
example, using PCR techniques with appropriate primers. By "corresponding" 
activity encoding regions is meant those regions encoding the same general type of 
activity. For example, a KR activity encoded at one location of a gene cluster 
10 "corresponds" to a KR encoding activity in another location in the gene cluster or 
in a different gene cluster. Similarly, a complete reductase cycle could be 
considered corresponding. For example, KR/DH/ER can correspond to a KR 
alone. 

If replacement of a particular target region in a host PKS is to be made, 

15 this replacement can be conducted in vitro using suitable restriction enzymes. The 
replacement can also be effected in vivo using recombinant techniques involving 
homologous sequences framing the replacement gene in a donor plasmid and a 
receptor region in a recipient plasmid. Such systems, advantageously involving 
plasmids of differing temperature sensitivities are described, for example, in PCT 

20 publication No. WO 96/40968, incorporated herein by reference. The vectors used 
to perform the various operations to replace the enzymatic activity in the host PKS 
genes or to support mutations in these regions of the host PKS genes can be 
chosen to contain control sequences operably linked to the resulting coding 
sequences in a manner such that expression of the coding sequences can be 

25 effected in an appropriate host. 

However, simple cloning vectors may be used as well. If the cloning 
vectors employed to obtain PKS genes encoding derived PKS lack control 
sequences for expression operably linked to the encoding nucleotide sequences, 
the nucleotide sequences are inserted into appropriate expression vectors. This 

30 need not be done individually, but a pool of isolated encoding nucleotide 
sequences can be inserted into expression vectors, the resulting vectors 
transformed or transfected into host cells, and the resulting cells plated out into 
individual colonies. The invention provides a variety of recombinant DNA 
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compounds in which the various coding sequences for the domains and modules 
of the megalomicin PKS are flanked by non-naturally occurring restriction enzyme 
recognition sites. 

The various PKS nucleotide sequences can be cloned into one or more 
5 recombinant vectors as individual cassettes, with separate control elements, or 
under the control of, e.g., a single promoter. The PKS subunit encoding regions 
can include flanking restriction sites to allow for the easy deletion and insertion of 
other PKS subunit encoding sequences so that hybrid PKSs can be generated. The 
design of such unique restriction sites is known to those of skill in the art and can 
10 be accomplished using the techniques described above, such as site-directed 
mutagenesis and PCR. 

The expression vectors containing nucleotide sequences encoding a variety 
of PKS enzymes for the production of different polyketides are then transformed 
into the appropriate host cells to construct the library. In one straightforward 
15 approach, a mixture of such vectors is transformed into the selected host cells and 
the resulting cells plated into individual colonies and selected to identify 
successful transformants. Each individual colony has the ability to produce a 
particular PKS synthase and ultimately a particular polyketide. Typically, there 
will be duplications in some, most, or all of the colonies; the subset of the 
20 transformed colonies that contains a different PKS in each member colony can be 
considered the library. Alternatively, the expression vectors can be used 
individually to transform hosts, which transformed hosts are then assembled into a 
library. A variety of. strategies are available to obtain a multiplicity of colonies 
each containing a PKS gene cluster derived from the naturally occurring host gene 
25 cluster so that each colony in the library produces a different PKS and ultimately a 
different polyketide. The number of different polyketides that are produced by the 
library is typically at least four, more typically at least ten, and preferably at least 
20, and more preferably at least 50, reflecting similar numbers of different altered 
PKS gene clusters and PKS gene products. The number of members in the library 
30 is arbitrarily chosen; however, the degrees of freedom outlined above with respect 
to the variation of starter, extender units, stereochemistry, oxidation state, and 
chain length enables the production of quite large libraries. 
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Methods for introducing the recombinant vectors of the invention into 
suitable hosts are known to those of skill in the art and typically include the use of 
CaCh or agents such as other divalent cations, lipofection, DMSO, protoplast 
transformation, conjugation, infection, transfection, and electroporation. The 
5 polyketide producing colonies can be identified and isolated using known 

techniques and the produced polyketides further characterized. The polyketides 
produced by these colonies can be used collectively in a panel to represent a 
library or may be assessed individually for activity. 

The libraries of the invention can thus be considered at four levels: (1) a 

10 multiplicity of colonies each with a different PKS encoding sequence; (2) the 

proteins produced from the coding sequences; (3) the polyketides produced from 
the proteins assembled into a functional PKS; and (4) antibiotics or compounds 
with other desired activities derived from the polyketides. Of course, combination 
libraries can also be constructed wherein members of a library derived, for 

15 example, from the megalomicin PKS can be considered as a part of the same 
library as those derived from, for example, the rapamycin PKS or DEBS. 

Colonies in the library are induced to produce the relevant synthases and 
thus to produce the relevant polyketides to obtain a library of polyketides. The 
polyketides secreted into the media can be screened for binding to desired targets, 

20 such as receptors, signaling proteins, and the like. The supematants per se can be 
used for screening, or partial or complete purification of the polyketides can first 
be effected. Typically, such screening methods involve detecting the binding of 
each member of the library to receptor or other target ligand. Binding can be 
detected either directly or through a competition assay. Means to screen such 

25 libraries for binding are well known in die art and can be applied in accordance 
with the methods of the present invention. Alternatively, individual polyketide 
members of the library can be tested against a desired target. In this event, screens 
wherein the biological response of the target is measured can more readily be 
included. Antibiotic activity can be verified using typical screening assays such as 

30 those set forth in Lehrer et aL, 1991, J. Immunol. Melh. 737:167-173, incorporated 
herein by reference, and in the Examples below. 

The invention provides methods for the preparation of a large number of 
polyketides. These polyketides are useful intermediates in formation of 
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compounds with antibiotic or other activity through hydroxylation, epoxidation, 

0 

and glycosylation reactions as described above. In general, the polyketide products 
of the PKS must be further modified, typically by hydroxylation and glycosylation, 
to exhibit potent antibiotic activity. Hydroxylation results in the novel polyketides 
5 of the invention that contain hydroxyl groups at C-6, which can be accomplished 
using the hydroxylase encoded by the eryF gene, and/or C-12, which can be 
accomplished using the hydroxylase encoded by the picK or eryATgene. Also, the 
oleP gene is available in recombinant form, which can be used to express the oleP 
gene product in any host cell. A host cell, such as a Streptomyces host cell or a 
1 0 Saccharopolyspora erythraea host cell, modified to express the oleP gene thus can 
be used to produce polyketides comprising the C-8-C-Sa epoxide present in 
oleandomycin. Thus the invention provides such modified polyketides. The 
presence of hydroxyl groups at these positions can enhance the antibiotic activity 
of the resulting compound relative to its unhydroxylated counterpart. 
1 5 Methods for glycosylating polyketides are generally known in the art and 

can be applied in accordance with the methods of the present invention; the 
glycosylation may be effected intracellular^ by providing the appropriate 
glycosylation enzymes or may be effected in vitro using chemical synthetic means 
as described herein and in PCT publication No. WO 98/49315, incorporated 
20 herein by reference. Preferably, glycosylation with desosamine, mycarose, and/or 
megosamine is effected in accordance with the methods of the invention in 
recombinant host cells provided by the invention. In general, the approaches to 
effecting glycosylation mirror those described above with respect to 
hydroxylation. The purified enzymes, isolated from native sources or 
25 recombinantly produced may be used in vitro. Alternatively and as noted, 

glycosylation may be effected intracellularly using endogenous or recombinantly 
produced intracellular glycosylases. In addition, synthetic chemical methods may 
be employed. 

The antibiotic modular polyketides may contain any of a number of 
30 different sugars, although D-desosamine, or a close analog thereof, is most 

common. Erythromycin, picromycin, megalomicin, narbomycin, and methymycin 
contain desosamine. Erythromycin also contains L-cladinose (3-O-methyl 
mycarose). Tylosin contains mycaminose (4-hydroxy desosamine), mycarose and 
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6-deoxy-D-allose. 2-acetyl-l-bromodesosamine has been used as a donor to 
glycosylate polyketides by Masamune et aL 9 1975, J. Am. Chem. Soc. 97: 3512- 
3513. Other, apparently more stable donors include glycosyl fluorides, 
thioglycosides, and trichloroacetirnidates; see Woodward et ah, 1981, J. Am. 
5 Chem. Soc. 103: 3215; Martin et al t 1997, J. Am. Chem. Soc. 119: 3193; Toshima 
et ai, 1995, J. Am. Chem. Soc. 117: 3717; Matsumoto et til, 1988, Tetrahedron 
Lett. 29: 3575. Glycosylation can also be effected using the polyketide aglycones 
as starting materials and using Saccharopolyspora erythraea or Streptomyces 
venezuelae or other host cell to make the conversion, preferably using mutants 
10 unable to synthesize macrolides, as discussed in the preceding Section. 

Thus, a wide variety of polyketides can be produced by the hybrid PKS 
enzymes of the invention. These polyketides are useful as antibiotics and as 
intermediates in the synthesis of other useful compounds, as described in the 
following section. 

15 

Section VII: Host Cells Containing Multiple Expression Vectors 

A recombinant host cell of the invention may contain nucleic acid 
encoding a megalomicin PKS domain, module, or protein, or megalomicin 
modification enzyme at a single genetic locus, e.g., on a single plasmid or at a 

20 single chromosomal locus, or at different genetic loci, e.g., on separate plasmids 
and/or chromosomal loci. By "multiple" is meant two or more; by "vector" is 
meant a nucleic acid molecule which can be used to transform host systems and 
which contains an independent expression system containing a coding sequence 
under control of a promoter and optionally a selectable marker and any other 

25 suitable sequences regulating expression. Typical such vectors are plasmids, but 
other vectors such as phagemids, cosmids, viral vectors and the like can be used 
according to the nature of the host. Of course, one or more of the separate vectors 
may integrate into the chromosome of the host (selection may not be required for 
maintenance of integrated vectors). 

30 In one embodiment, the invention provides a recombinant host cell, which 

comprises at least two separate autonomously replicating recombinant DN A 
expression vectors, each of said vectors comprises a recombinant DNA compound 
encoding a megalomicin PKS domain or a megalomicin modification enzyme 
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operably linked to a promoter. In another embodiment, the invention provides a 
recombinant host cell, which comprises at least one autonomously replicating 
recombinant DN A expression vector and at least one modified chromosome, each 
of said vector(s) and each of said modified chromosome comprises a recombinant 
5 DNA compound encoding a megalomicin PKS domain or a megalomicin 

modification enzyme operably linked to a promoter. Preferably, the autonomously 
replicating recombinant DNA expression vector and/or the modified chromosome 
further comprises distinct selectable markers. 

The above multiple-vector (chromosome) expression systems can also be 
10 used for expressing heterogeneous polyketide biosynthetic enzymes, e.g., for 

expressing Micromonospora megalomicea megalomicin PKS protein, module, or 
domain or a megalomicin modification enzyme with a PKS protein, module, or 
domain, or modification enzyme from other origins in the same host cells. By 
placing various activities on different expression vectors, a high degree of 
15 variation can be achieved in an efficient manner. A variety of hosts can be used; 
any suitable host cell that can maintain multiple vectors can readily be used. 
Preferred hosts include Streptomyces, yeast, E. coli, other actinomycetes, and plant 
cells, and mammalian or insect cells or other suitable recombinant hosts can also 
be used. Preferred among yeast strains are Saccharomyces cerevisiae and Pichia 
20 pastor is. Preferred actinomycetes include various strains of Streptomyces. 

If one chooses to use a host cell that does not naturally produce a 
polyketide, then one may need to ensure that the recombinant host is modified to 
also contain a holo ACP synthase activity that effects pantetheinyiation of the acyi 
carrier protein. See PCT Pub. No. WO 97/13845, incorporated herein by 
25 reference. One of the multiple vectors may be used for this purpose. This 

activation step is necessary for activation of the ACP. The expression system for 
the holo ACP synthase may be supplied on a vector separate from that carrying a 
PKS coding sequence or may be supplied on the same vector or may be integrated 
into the chromosome of the host, or may be supplied as an expression system for a 
30 fusion protein with all or a portion of a polyketide synthase (see U.S. Patent No. 
6,033,883, incorporated herein by reference). 

It should be noted that in some recombinant hosts, it may also be necessary 
to activate the polyketides produced through postsynthesis modifications when 
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polyketides having such modifications are desired. If this is the case for a 
particular host, the host will be modified, for example by transformation, to 
contain those enzymes necessary for effecting these modifications. Among such 
enzymes, for example, are glycosylation enzymes. The use of multiple vectors can 

5 facilitate the introduction of expression systems for such enzymes. 

In a preferred embodiment, the multiple vector system is used to assemble 
rapidly and efficiently a combinatorial library of polyketides and the 
PKS/modification enzymes that produce them. In an illustrative embodiment, the 
multiple vector system comprises four different vectors, one comprising the megAI 

10 gene, one the megAll gene, one the megAI II gene, and one the modification 

enzyme(s) gene(s). Each of these vectors can be modified to make a set of vectors. 
For example, one set could contain all possible AT substitutions in the loading and 
first and second extender modules of the megAI gene product. Another set could 
contain expression systems for a variety of different modification enzymes. With 

1 5 these four vectors sets and by combining each member of each set with each 
member of the other three sets, a very large library of cells, vector sets, and 
polyketides can be rapidly and efficiently assembled. 

The combinatorial potential of a modular PKS such as the megalomicin 
PKS (ignoring the additional potential of different modification enzyme systems) 

20 is minimally given by: ATl X (ATe X 4) M where AT L is the number of loading 
acyl transferases, ATe is the number of extender acyl transferases, and M is the 
number of modules in the gene cluster. The number 4 is present in the formula 
because this represents the number of ways a keto group can be modified by either 
]) no reaction; 2) KR activity alone; 3) KR+DH activity; or 4) KR+DH+ER 

25 activity. It has been shown that expression of only the first two modules of the 
erythromycin PKS resulted in the production of a predicted truncated triketide 
product (See Kao et al., J. Am. Chem. Soc, H6:l 1612-1 161 3 ((1994)). A novel 
1 2-membered macrolide similar to methymycin aglycone was produced by 
expression of modules 1-5 of this PKS in S. coelicolor (See Kao et al., J. Am. 

30 Chem. Soc, U7: 9 105-9 106 (1995)). This work shows that PKS modules are 

functionally independent so that lactone ring size can be controlled by the number 
of modules present. 
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In addition to controlling the number of modules, the modules can be 
genetically modified, for example, by the deletion of a ketoreductase domain as 
described by Donadio et al., Science, 252:675-679 (1991); and Donadio et al., 
Gene, 115 :97-103 (1992). In addition, the mutation of an enoyl reductase domain 
5 was reported by Donadio, et al., Proc. Natl Acad. ScL, 90:71 19-7123 (1993). 

These modifications also resulted in modified PKS and thus modified polyketides. 

As stated above, in the present invention, the coding sequences for 
catalytic activities derived from the megalomicin PKS systems found in nature can 
be used in their native forms or modified by standard mutagenesis techniques to 
10 delete or diminish activity or to introduce an activity into a module in which it was 
not originally present. For example, a KR activity can be introduced into a 
module normally lacking that function. 

In one embodiment of the invention herein, a single host cell is modified to 
contain a multiplicity of vectors, each vector contributing a portion of the 
15 synthesis of a megalomicin PKS and modification enzyme (if any) system. Each 
of the multiple vectors for production of the megalomicin PKS system typically 
encodes at least two modules, and at least one of the vectors integrates into the 
chromosome of the host. Integration can be effected using suitable phage or 
integrating vectors or by homologous recombination. If homologous 
20 recombination is used, the integration event may also be designed to delete 
endogenous PKS genes residing in the chromosome, as described in the PCT 
application WO 95/0S54S. In these embodiments, too, a selectable marker such as 
hygromycin or thiostrcpton resistance can be included in the vector that effects 
integration. 

25 As mentioned above, additional enzymes that effect post-translational 

modifications to the enzyme systems in the megalomicin PKS may be introduced 
into the host through suitable recombinant expression systems. In addition, 
enzymes that activate the polyketides themselves, for example, through 
glycosylation may be added. It may also be desirable to modify the cell to produce 

30 more of a particular substrate utilized in polyketide biosynthesis. For example, it 
is generally believed that malonyl CoA levels in yeast are higher than 
methylmalonyl CoA; if yeast is chosen as a host, it may be desirable to increase 
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methylmalonyl Co A levels by the addition of one or more biosynthetic enzymes 
therefor. 

The multiple-vector expression system can also be used to make 
polyketides produced by the addition of synthetic starter units to a PKS that 

5 contains an inactivated ketosynthase (ICS) in the first module. As noted above, 
this modification permits the system to incorporate a suitable diketide thioester 
such as 3-hydroxy-2-methyI pantonoic acid-N-acetyl cysteamine thioester, or 
similar thioesters of diketide analogs, as described by Jacobsen et al., Science, 
277 :367-369 (1997). The construction of PKS modules containing inactivated 

10 ketosynthase regions can be conducted by methods known in the art, such as the 
method described in U.S. Patent No. 6,080,555 and PCT publication Nos. WO 
99/03986 and 97/02358, each of which is incorporated herein by reference, in 
accordance with the methods of the present invention. 

The multiple- vector expression system can be used to produce polyketides 

1 5 in hosts that normally do not produce them, such as E. coli and yeast. It also 
provides more efficient means to provide a variety of polyketide products by 
supplying the elements of the introduced PKS, whether in an E. coli or yeast host 
or in other more traditionally used hosts, such as Streptomyces. The invention 
also includes libraries of polyketides prepared using the methods of the invention. 

20 

Section VIII: Compounds 

The methods and recombinant DNA compounds of the invention are useful 
in the production of polyketides. In one important aspect, the invention provides 
methods for making antibiotic compounds related in structure to erythromycin, a 

25 potent antibiotic compound. The invention also provides novel ketolide 

compounds, polyketide compounds with potent antibiotic activity of significant 
interest due to activity against antibiotic resistant strains of bacteria. See 
Griesgraber et ctl., 1996, J. AntibioL 49: 465-477, incorporated herein by 
reference. Most if not all of the ketolides prepared to date are synthesized using' 

30 erythromycin A, a derivative of 6-dEB, as an intermediate. In one embodiment, 
the present invention provides the 3-keto derivatives of the megalomicins for use 
as antibiotics. In particular, the 3-keto derivative of megalomicin A is a preferred 
ketolide of the invention. These compounds can be made chemically, substantially 
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in accordance with the procedures for making ketolides described in the prior art, 
or in recombinant host cells of the invention in which the megosamine and 
desosamine biosynthetic and transferase genes are present but which do not make 
or transfer the mycarose moiety and/or the PKS has been modified to delete the 

5 KR domain of extender module 6. The invention also provides methods for 

making intermediates useful in preparing traditional, 6-dEB- and erythromycin- 
derived ketolide compounds. See r Griesgraber et aL, supra; Agouridas et aL, 1998, 
J. Med. Chem. 41: 4080-4100, U.S. Patent Nos. 5,770,579; 5,760,233; 5,750,510; 
5,747,467; 5,747,466; 5,656,607; 5,635,485; 5,614,614; 5,556,118; 5,543,400; 

10 5,527,780; 5,444,051; 5,439,890; 5,439,889; and PCT publication Nos. WO 
98/09978 and 98/28316, each of which is incorporated herein by reference. 

As noted above, the hybrid PKS genes of the invention can be expressed in 

* 

a host cell that contains the desosamine, megosamine, and/or mycarose 
biosynthetic genes and corresponding transferase genes as well as the required 

1 5 hydroxylase gene(s), which may, for example and without limitation, be either 
picK, megK, or eryK (for the C-12 position) and/or megF oxeryF (for the C-6 
position). The resulting compounds have antibiotic activity but can be further 
modified, as described in the patent publications referenced above, to yield a 
desired compound with improved or otherwise desired properties. Alternatively, 

20 the aglycone compounds can be produced in the recombinant host cell, and the 

desired glycosylation and hydroxylation steps carried out in vitro or in vivo, in the 
latter case by supplying the converting cell with the aglycone, as described above. 

The compounds of the invention are thus optionally glycosylated forms of 
the polyketide set forth in formula (1) below which are hydroxylated at either the 

25 C-6 or the C-12 or both. The compounds of formula (1) can be prepared using the 
loading and the six extender modules of a modular PKS, modified or prepared in 
hybrid form as herein described. These polyketides have the formula: 
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including the glycosylated and isolated stereoisomer^ forms thereof; 

wherein R* is a straight chain, branched or cyclic, saturated or unsaturated 
substituted or unsubstituted hydrocarbyl of 1-15C; 
5 each of R ] -R 6 is independently H or alkyl (1-4C) wherein any alkyl at R 1 

may optionally be substituted; 

each of X^X 3 is independently two H, H and OH, or =0; or 

each of X^X 3 is independently H and the compound of formula (2) 
contains a double-bond in the ring adjacent to the position of said X at 2-3, 4-5, 6- 
10 7, 8-9 and/or 10-11; 

with the proviso that: 

at least two of R'-R 6 are alkyl (1-4C). 

Preferred compounds comprising formula 2 are those wherein at least three 
of R'-R 3 are alkyl (1-4C), preferably methyl or ethyl; more preferably wherein at 

1 5 least four of R^R 3 are alkyl (1-4C), preferably methyl or ethyl. Also preferred are 
those wherein X 2 is two H, =0, or H and OH, and/or X 3 is H, and/or X 1 is OH 
and/or X 4 is OH and/or X 5 is OH. Also preferred are compounds with variable R* 
when R*-R 5 is methyl, X 2 is =0, and X 1 , X 4 and X 5 are OH. The glycosylated 
forms (i.e., mycarose or cladinose at C-3, desosamine at C-5, and/or megosamine 

20 at C-6) of the foregoing are also preferred. 

As described above, there are a wide variety of diverse organisms that can 
modify compounds such as those described herein to provide compounds with or 
that can be readily modified to have useful activities. For example, 
Saccharopolyspora erythraea can convert 6-dEB to a variety of useful 
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compounds. The compounds provided by the present invention can be provided to 
cultures of Saccharopolyspora erythraea and converted to the corresponding 
derivatives of erythromycins A, B, C, and D in accordance with the procedure 
provided in the Examples, below. To ensure that only the desired compound is 
5 produced, one can use an S. erythraea eryA mutant that is unable to produce 6- 
dEB but can still carry out the desired conversions (Weber et a/., 1985, J. 
Bacteriol. 1 64(1): 425-433). Also, one can employ other mutant strains, such as 
eryB, eryC, eryG, and/or eryK mutants, or mutant strains having mutations in 
multiple genes, to accumulate a preferred compound. The conversion can also be 
10 carried out in large fermentors for commercial production. Each of the 

erythromycins A, B, C, and D has antibiotic activity, although erythromycin A has 
the highest antibiotic activity. Moreover, each of these compounds can form, 
under treatment with mild acid, a C-6 to C-9 hemiketal with motilide activity. For 
formation of hemiketals with motilide activity, erythromycins B, C, and D, are 
15 preferred, as the presence of a C- 12 hydroxy 1 allows the formation of an inactive 
compound that has a hemiketal formed between C-9 and C-12. 

Thus, the present invention provides the compounds produced by 
hydroxylation and glycosylation of the compounds of the invention by action of 
the enzymes endogenous to Saccharopolyspora erythraea and mutant strains of S. 
20 eryihraea. Such compounds are useful as antibiotics or as motilides directly or 
after chemical modification. For use as antibiotics, the compounds of the 
invention can be used directly without further chemical modification. 
Erythromycins A, B, C, and D all have antibiotic activity, and the corresponding 
compounds of the invention that result from the compounds being modified by 
25 Saccharopolyspora erythraea also have antibiotic activity. These compounds can 
be chemically modified, however, to provide other compounds of the invention 
with potent antibiotic activity. For example, alkylation of erythromycin at the C-6 
hydroxyl can be used to produce potent antibiotics (clarithromycin is C-6-O- 
methyl), and other useful modifications are described in, for example, Griesgraber 
30 et aL, 1996, J. Antihiot. 49: 465-477, Agouridas et al. y 1998, J. Med. Chem. 41: 
4080-4100, U.S. Patent Nos. 5,770,579; 5,760,233; 5,750,510; 5,747,467; 
5,747,466; 5,656,607; 5,635,485; 5,614,614; 5,556,118; 5,543,400; 5,527,780; 
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5,444,05 1 ; 5,439,890; and 5,439,889; and PCT publication Nos. WO 98/09978 
and 98/28316, each of which is incorporated herein by reference. 

For use as motilides, the compounds of the invention can be used directly 
without further chemical modification. Erythromycin and certain erythromycin 
5 analogs are potent agonists of the motilin receptor that can be used clinically as 
prokinetic agents to induce phase III of migrating motor complexes, to increase 
esophageal peristalsis and LES pressure in patients with GERD, to accelerate 
gastric emptying in patients with gastric paresis, and to stimulate gall bladder 
contractions in patients after gallstone removal and in diabetics with autonomic 

10 neuropathy. See Peeters, 1999, Motilide Web Site, http://www.med.kuleuven. 
ac.be/med/gih/motiHd.htm, and Omura el al., 1987, Macrolides with 
gastrointestinal motor stimulating activity, J. Med Chem. 30: 1941-3). The 
corresponding compounds of the invention that result from the compounds of the 
invention being .modified by Saccharopolyspora erythraea also have motilide 

1 5 activity, particularly after conversion, which can also occur in vivo, to the C-6 to 
C-9 herniketal by treatment with mild acid. Compounds lacking the C-12 hydroxy] 
are especially preferred for use as motilin agonists. These compounds can also be 
further chemically modified, however, to provide other compounds of the 
invention with potent motilide activity. 

20 Moreover, and also as noted above, there are other useful organisms that 

can be employed to hydroxylate and/or glycosylate the compounds of the 
invention. As described above, the organisms can be mutants unable to produce, 
the polyketide normally produced in that organism, the fermentation can be carried 
out on plates or in large fermentors, and the compounds produced can be 

25 chemically altered after fermentation. In addition to Saccharopolyspora erythraea, 
Streptomyces venezuelae, S. narbonensis, S. antibioticus, Micromonospora 
megalomicea, S, fradiae, and S, thermotolerans can also be used. In addition to 
antibiotic activity, compounds of the invention produced by treatment with M. 
megalomicea enzymes can have antiparasitic activity as well. Thus, the present 

30 invention provides the compounds produced by hydroxylation and glycosylation 
by action of the enzymes endogenous to S. erylhraea, 5. Venezuelan, S. 
narbonensis, S. antibioticus, M megalomicea, S. fradiae, and SI thermotolerans. 
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The present invention also provides methods and genetic constructs for 
producing the glycosylated and/or hydroxylated compounds of the invention 
directly in the host cell of interest. Thus, the recombinant genes of the invention, 
which include recombinant rnegAI, megAII y and megAIII genes with one or more 

5 deletions and/or insertions, including replacements of a megA gene fragment with 
a gene fragment from a heterologous PKS gene, can be included on expression 
vectors suitable for expression of the encoded gene products in 
Saccharopolyspora erythraea, Micromonospora megalomicea^ S. venezuelae, S. 
narbonensis, S. antibioticus, S. fradiae 9 and SL thermotolerans. 

1 0 The compounds of the invention can be produced by growing and 

fermenting the host cells of the invention under conditions known in the art for the 
production of other polyketides. The compounds of the invention can be isolated 
from the fermentation broths of these cultured cells and purified by standard 
procedures. The compounds can be readily formulated to provide the 

15 pharmaceutical compositions of the invention. The pharmaceutical compositions 
of the invention can be used in the form of a pharmaceutical preparation, for 
example, in solid, semisolid, or liquid form. This preparation will contain one or 
more of the compounds of the invention as an active ingredient in admixture with 
an organic or inorganic carrier or excipient suitable for external, enteral, or 

20 parenteral application. The active ingredient may be compounded, for example, 
with the usual non-toxic, pharmaceutically acceptable carriers for tablets, pellets, 
capsules, suppositories, solutions, emulsions, suspensions, and any other form 
suitable for use. 

The carriers which can be used include water, glucose, lactose, gum acacia, 
25 gelatin, mannitol, starch paste, magnesium trisilicate, talc, corn starch, keratin, 
colloidal silica, potato starch, urea, and other carriers suitable for use in 
manufacturing preparations, in solid, semi-solid, or liquified form. In addition, 
auxiliary stabilizing, thickening, and coloring agents and perfumes may be used. 
For example, the compounds of the invention may be utilized with hydroxy propyl 
30 methylcellulose essentially as described in U.S. Patent No. 4,916,138, 

incorporated herein by reference, or with a surfactant essentially as described in 
EPO patent publication No. 428,169, incorporated herein by reference. 

84 

wcnnnin- ^wn mc»7?AAA?t ia-> 



WO 01/27284 



PCT/US00/27433 



Oral dosage forms may be prepared essentially as described by Hondo et 
al. 7 1987, Transplantation Proceedings XIX, Supp. 6: 17-22, incorporated herein 
by reference. Dosage forms for external application may be prepared essentially as 
described in EPO patent publication No. 423,714, incorporated herein by 
5 reference- The active compound is included in the pharmaceutical composition in 
an amount sufficient to produce the desired effect upon the disease process or 
condition. 

For the treatment of conditions and diseases caused by infection, a 
compound of the invention may be administered orally, topically, parenterally, by 

1 0 inhalation spray, or rectally in dosage unit formulations containing conventional 
non-toxic pharmaceutical^ acceptable carriers, adjuvant, and vehicles. The term 
parenteral, as used herein, includes subcutaneous injections, and intravenous, 
intramuscular, and intrasternal injection or infusion techniques. 

Dosage levels of the compounds of the invention are of the order from 

15 about 0.01 mg to about 50 mg per kilogram of body weight per day, preferably 
from about 0.1 mg to about 10 mg per kilogram of body weight per day. The 
dosage levels are useful in the treatment of the above-indicated conditions (from 
about 0.7 mg to about 3.5 mg per patient per day, assuming a 70 kg patient). In 
addition, the compounds of the invention may be administered on an intermittent 

20 basis, i.e., at semi-weekly, weekly, semi-monthly, or monthly intervals. 

The amount of active ingredient that may be combined with the carrier 
materials to produce a single dosage form will vary depending upon the host 
treated and the particular mode of administration. For example, a formulation 
intended for oral administration to humans may contain from 0.5 mg to 5 gm of 

25 active agent compounded with an appropriate and convenient amount of carrier 
material, which may vary from about 5 percent to about 95 percent of the total 
composition. Dosage unit forms will generally contain from about 0.5 mg to about 
500 mg of active ingredient. For external administration, the compounds of the 
invention may be formulated within the range of, for example, 0,00001% to 60% 

30 by weight, preferably from 0.001% to 10% by weight, and most preferably from 
about 0.005% to 0.8% by weight. 

It will be understood, however, that the specific dose level for any 
particular patient will depend on a variety of factors. These factors include the 
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activity of the specific compound employed; the age, body weight, general health, 
sex, and diet of the subject; the time and route of administration and the rate of 
excretion of the drug; whether a drug combination is employed in the treatment; 
and the severity of the particular disease or condition for which therapy is sought. 
5 A detailed description of the invention having been provided above, the 

following examples are given for the purpose of illustrating the invention and shall 
not be construed as being a limitation on the scope of the invention or claims. 

Example 1 

10 Cloning and Characterization of the Megalomicin Biosvnthetic Gene Cluster from 

Micromonospora meglomicea 
Experimental Procedures 

Bacterial Strains, Media, and Growth Conditions 

Routine DNA manipulations were performed in Escherichia coli XL1 Blue 

15 or E. coli XL1 Blue MR (Stratagene) using standard culture conditions (Sambrook 
et aL 9 1989). M megalomicea subs, nigra NRRL3275 was obtained from the 
ATCC collection and cultured according to recommended protocols. For isolation 
of genomic DNA, M. megalomicea was grown in TSB (Hopwood et aZ., 1985) at 
30 °C. £ lividans K4-1 14 (Ziermarui and Betlach, 1999), which carries a deletion 

20 of the actinorhodin biosynthetic gene cluster, was used as the host for expression 
of the megAI-AIII genes. 5. lividans strains were maintained on R5 agar at 30°C 
and grown in liquid YEME for preparation of protoplasts (Hopwood et aL, 1985) . 
S. erythraea NRRL2338 was used for expression of the megosamine genes. S. 
erythraea strains were maintained on R5 agar at 34°C and grown in liquid TSB for 

25 preparation of protoplasts. 

Manipulation of DNA and Organisms 

Manipulation and transformation of DNA in E. coli was performed by 
standard procedures (Sambrook et aL, 1989) or by suppliers protocols. Protoplasts 
30 of S. lividans and S. erythraea were generated for transformation by plasmid DNA 
using the standard procedure. S. lividans transformants were selected on R5 using 
2 ml of a 0.5 mg/ml thiostrepton overlay. S. erythraea transformants were selected 
on R5 using 1.5 ml of a 0.6 mg/ml apramycin overlay. 
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Isolat ion of the meg gene cluster 

A cosmid library was prepared in SuperCos (Stratagene) from M. 
megalomiceo total DNA partially digested with Sau3A 1, and introduced into E. 
5 coli using a Gigapack HI XL (Stratagene) in-vitro packaging kit. 32 P-labelled DNA 
probes encompassing the KS2 domain from ery DEBS, or a mixture of segments 
encompassing modules I and 2 from ery DEBS were used separately to screen the 
cosmid library by colony hybridization. Several colonies which hybridized with 
the probes were further analyzed by sequencing the ends of their cosmid inserts 

10 using T3 and T7 primers. BLAST (Altschul et ai 9 1990) analysis of the sequences 
revealed several colonies with DNA sequences highly homologous to genes from 
the ery cluster. Together with restriction analysis, this led to the isolation of two 
overlapping cosmids, pKOS079-93A and pKOS079-93D which covered —45 kb of 
the meg cluster. A 400 bp PGR fragment was generated from the left end of and 

15 pKOS079-93D and used to reprobe the cosmid library. Likewise, a 200 bp PCR 
fragment generated from the right end of pKOS079-93 A was used to reprobe the 
cosmid library. Analysis of hybridizing colonies as described above resulted in 
identification of two additional cosmids, pKOS079-138B and pKOS79-124B 
which overlap the previous two cosmids. BLAST analysis of the far left and right 

20 end sequences of these cosmids indicated no homology to any known genes 
related to polyketide biosynthesis and therefore indicates that the set of four 
cosmids spans the entire megalomicin biosynthetic gene cluster. 

DNA sequencing and analysis 

25 PCR-based double stranded DNA sequencing was performed on a 

Beckman CEQ 2000 capillary sequencer using reagents and protocols provided by 
the manufacturer. A shotgun library of the entire cosmid pKOS079-93D insert was 
made as follows: DNA was first digested with Dra I to eliminate the vector 
fragment, then partially digested with Sau3A I. After agarose electrophoresis, 

30 bands between 1-3 kb were excised from the gel and ligated with BamH I digested 
pUC19. Another shotgun library was generated from a 12 kb Xho VEcoR I 
fragment subcloned from cosmid pKOS079-93A to extend the sequence to the 
megFgene. A 4 kb Bgl III Xho I fragment from cosmid pKOS079-138B was 
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sequenced by primer walking to extend the sequencing to the megT gene. 
Sequence was assembled using Sequencher (Gene Codes Corp.) software package 
and analyzed with Mac Vector (Oxford Molecular Group) and the NCBI BLAST 
server (www.ncbi.nlm.nih.gov/BLAST/). 

5 

Plasmids 

Plasmid pKOS 108-6 is a modified version of pKAO^'kan* (Ziermann 
and Betlach, 1999; Ziermann and Betlach, 2000) in which the cryAI-III genes 
between the Pac I and EcoR I sites have been replaced with the megAI-ILl genes. 
1 0 This was done by first substituting a synthetic nucleotide DNA duplex (5 5 - 
T A AG AATTC GGAG ATCTGGCCTC A GCTCTAG A C (SEQ ID NO: 21), 
complementary oligo 5'- 

AATTGTCTAGAGCTGAGGCCAGATCTCCGAATTCTTAAT (SEQ ID NO: 
22)) between the Pac I and EcoR I sites of the pKA0127'kan' vector fragment. 
1 5 The 22 kb EcoR l/Bgl II fragment from cosmid pKOS079-93D containing the 

megAI-II genes was inserted into EcoR I and Bgl II sites of the resulting plasmid to 
generate pKOS024-84. A 12 kb Bgl WBbvC I fragment containing the megAIII 
and part of the megCII gene was subcloned from pKOS079-93 A and excised as a 
Bgl IVXba I fragment and ligated into the corresponding sites of pKOS024-84 to 
20 yield the final expression plasmid pKOS 108-06. 

The megosamine integrating vector, pKOS97-42 5 was constructed as 
follows: A subclone was generated containing the 4 kb XI%o \ISca I fragment from 
pKOS79-138B together with the 1 .7 kb Sea VPst I fragment from pKOS79-93D in 
Litmus 28 (Stratagene). The entire 5.7 kb fragment was then excised as a Spe VPs! 
25 I fragment and combined with the 6.3 kb Pst VEcoR I fragment from KOS79-93D 
and EcoR HXba I digested pSET152 (Bierman et al., 1992) to construct plasmid 
pKOS97-42. 

Production and analysis of secondary metabolites 
30 Fermentation for production of polyketide, LC/MS analysis, and 

quantification of 6-dEB for S. lividans K4-1 14/pKOS 108-6 and S. lividans K4- 
1 14/pKA0127'kan* were essentially as previously described (Xue et ai, 1999). S. 
erythraea NRRL2338 and S. erythraea/pKO$97-42 were grown for 6 days in Fl 
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media (Brunker et ai, 1998). Samples of broth were clarified in a microcentrifuge 
(5 min, 1 3,000 rpm). For LC/MS preparation, isopropanol was added to the 
supernatant (1 :2 ratio) and centrifuged again. Erythromycins and megalomicins 
were detected by electrospray mass spectrometry and quantity was determined by 
5 evaporative light scattering detection (ELSD). The LC retention time and mass 
spectra of erythromycin and megalomicins were identical to known standards. 

Nucleotide sequence of the meg gene cluster 

A series of 4 overlapping inserts containing the meg cluster (Figure 9) were 

10 isolated from a cosmid library prepared from total genomic DNA of M 

megalomicea and covers > 100 kb of the genome. A contiguous 48 kb segment 
which encodes the megalomicin PKS and several deoxysugar biosynthetic genes 
was sequenced and analyzed. The segment contains 17 complete ORFs as well as 
an incomplete ORF at each end, organized as shown in Figure 9. 

15 PKS genes. The ORFs megAI, megAII and megAIII encode the polyketide 

synthase responsible for synthesis of 6-dEB. The enzyme complex, meg DEBS, is 
highly similar to ery DEBS, with each of the three predicted polypeptides sharing 
an average of 83% overall similarity with their ery PKS counterpart. Both PKSs 
are composed of 6 modules (2 modules per polypeptide) and each module is 

20 organized in the identical manner (Figure 9). A dendrogram analysis (Schwecke et 
ai, 1995) employing 70 acyltranferase (AT) domains revealed that the 6 meg 
extender AT domains cluster with AT domains that incorporate methylmalonyl 
Co A (not shown). The loading module of meg DEBS also lacks a KS Q domain 
which is utilized by most macrolide PKSs for decarboxylation of the starter unit to 

25 initiate polyketide synthesis (Bisang et ai, 1 999; Kuhstoss et cii, 1 996; Kakavas et 
ai, 1997; Xue et al y 1998), implying that priming begins with a propionate unit. 
In addition, a conserved Gly to Pro substitution in the NADPH-binding region of 
the ketoreductase (KR) domain of module 3 is observed in meg DEBS, which has 
been proposed to account for its inactivity in ery DEBS (Donadio et a/., 1991). 

30 Deoxysugar genes. BLAST (Altschul et aL, 1990) analysis of the genes 

flanking the PKS indicated that 12 complete ORFs and 1 partial ORF appear to 
encode functions required for synthesis of one of the three megalomicin 
deoxysugars. Assignment of each ORF to a specific deoxysugar pathway was 
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made based on comparison to the ery genes and other related genes involved in 
deoxysugar biosynthesis (Table 2). 

Table 2. Deduced functions of genes identified in the megalomtcin gene cluster. 
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5 a. Determined by BLASTX analysis using default parameters. 
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Three ORFs, megBV, megCIIJ and megDI, encode glycosyltransferases, 
apparently one for attachment of each deoxysugar to the macrolide. MegB V was 
most similar to EryBV, the erythromycin mycarosyltransferase, and hence was 
assigned to the mycarose pathway in the meg cluster. The closest match for both of 
5 the remaining glycosyltransferases was EryCIII, the desosaminyltransferase in 
erythromycin biosynthesis. Given the higher degree of similarity between EryCIII 
and MegCfll (Table 2), MegCIH was designated the desosaminyltransferase, 
leaving MegDl as the proposed megosaminyltransferase. In similar fashion, 
assignments were made accordingly for; MegCII and MegDVI, two putative 3,4- 

1 0 isomerases similar to EryCII; MegBII and MegDVII, 2,3-reductases homologous 
to EiyBII; MegBIV and MegDV, putative 4-ketoreductases similar to EryBIV 
(Table 2). The remaining ORFs involved in deoxysugar biosynthesis,' megT, 
megDII, megDIII and megDIV, each encode a putative 2,3 -dehydratase, 
aminotransferase, dimethyl transferase and 3,5-epimerase, respectively (Table 2). 

15 Si nee both the megosamine and desosamine pathways require an aminotransferase 
and a dimethyltransferase, and since mycarose and megosamine each require a 
2,3-dehydratase and a 3,5-epimerase, assignments of these four genes to a specific 
pathway could not be made on the basis of sequence comparison alone. However, 
the latter three are implicated in megosamine biosynthesis by experiments 

20 described below. 

Other genes. Two additional complete ORFs, designated megY and megH 
and an incomplete ORF, designated megF, were also identified in the cluster. 
MegH and MegF share high degrees of similarity with EryH and EryF. EryH and 
homologs in other macrolide gene clusters are thioesterase-like proteins with 

25 unknown function in polyketide gene clusters (Haydock et al, 1991 ; Xue el aL, 

1998; Butler et al, 1999; Tang et al, 1999). EryF encodes the erythronolide B C-6 
hydroxylase (Figure 8) (Weber et aL 9 1991 ; Andersen and Hutchinson, 1992). 
MegY does not have an ery counterpart but appears to belong to a (small) family 
of O-acyltransferases that transfer short acyl chains to macrolides. Two classes 

30 exist: AcyA and MdmB transfer acetyl or propionyl groups to the C-3 hydroxyls 
on 16-membered macrolide rings (Arisawa^/ al. y 1994; Hara and Hutchinson, 
1992); CarE and Mpt transfer isovalerate or propionate to the mycarosyl moiety of 
carbomycin and midecamycin, respectively (Epp et aL, 1989; Arisawa et aL, 1993; 
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Gu et aL, 1996). The structures of various rnegalomicins suggest that MegY 
belongs to the latter class and is the acyl transferase which converts megalomicin A 
to rnegalomicins B, CI, or C2 (verified experimentally below). 

5 Heterologous expression of (he meg PKS genes. 

The wild type and genetically modified versions of the ery DEBS have 
been used extensively in heterologous Streptomyces hosts for enzyme studies and 
the production of novel polyketide compounds. Given the similarities between the 
ery and meg DEBSs, production characteristics were compared in a commonly 

10 used Streptomyces host strain. The three meg A ORFs were cloned into the 

expression plasmid pKAOnV'kan' (Ziermann and Betlach, 1999) in place of the 
eryA ORFs. Both plasmids, pKAOl27'kan* encoding ery DEBS and pKOS108-06 
encoding meg DEBS, were introduced in Streptomyces lividans K4-1 14 and the 
production of 6-dEB was determined in shake-flask fermentations. The production 

1 5 profiles were similar in both cases and the maximum titer of 6-dEB was between 
30-40 mg/L. In addition, both PKSs produced small amounts (-5%) of 8,8a- 
deoxyoleandolide, which results from the priming of the PKS with acetate instead 
of propionate (Kao et aL, 1994b). This observation indicates that the loading AT 
domains of the PKSs display similar relaxed specificities towards starter units. 

20 

Conversion of erythromycin to megalomicin in S. erythraea. 

An examination of the meg cluster revealed that the putative megosamine 
biosynthetic genes are clustered directly upstream of the PKS genes. If the 
hypothesis that these genes are sufficient for biosynthesis and attachment of 

25 megosamine to an erythromycin intermediate is correct, then functional expression 
of these genes in a strain which produces erythromycin, such as S, erythraea, 
should result in production of megalomicin. A 12 kb DNA fragment carrying all 
the genes between the leftmost Xhol site and the EcoRI site (Figure 9) was 
integrated in the chromosome of S. erythraea using the site-specific integrating 

30 vector pSETl 52 (Bierman et al , 1 992). It was surmised that the left and right ends 
of this fragment would contain necessary promoter regions for transcription of the 
convergent set of genes in M. megalomicea and that they would likely operate in 
S. erythraea. 
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Fermentation broth from S. ery(hraeaA^iOS91-42 y which contains the 
integrated meg genes, was analyzed by LC/MS and compared to LC/MS profiles 
of the parent S. erythraea strain without the meg genes, as well as to megalomicin 
standards purified from M. megalomicea. The new strain was found to produce a 
5 mixture of erythromycin A and various megalomicins (—4: 1 ratio), thereby 

showing that the predicted megosamine biosynthetic and glycosyltransferase genes 
are contained within the cloned meg fragment. The two most abundant congeners 
identified were megalomicins B and CI. Megalomicin A and C2 were also 
detected in smaller amounts. The presence of the megalomicins B, CI and C2 also 
1 0 provides direct evidence for the function of the O-acyl transferase, MegY, which 
is present in the integrated meg fragment. 

Discussion 

The homologies observed among modular PKSs enabled the use of ery 
15 PKS genes to clone the meg biosynthetic gene cluster from M megalomicea. The 
close similarities between the megalomicin and erythromycin biosynthetic 
pathways is also reflected in the overall organization of their genes and in the high 
degree of homology of the corresponding individual gene-encoded polypeptides. 
Production of 6-dEB from meg DEBS in S. lividans and conversion of 
20 erythromycin to megalomicin using the megD genes in S. erythraea provides 
direct evidence that the identified gene cluster is responsible for synthesis of 
megalomicin. 

As seen in Figure 9, the ~~ 40 kb segments of the two clusters beginning 
with ery/megBV on the left through the ery/megF genes retain a nearly identical 

25 organizational arrangement. The notable differences in this region are eryG and 
\SJJ36 which are absent from the segment of the meg cluster analyzed. The eryG 
gene encodes an S-adenosylmethionine (S AM)-dependent mycarosyl 
methyltransferase that converts erythromycin C to erythromycin A (Figure 8) 
(Weber el aL, 1990; Haydock et a/., 1991). The mycarose moiety is modified by 

30 esterification (MegY) in megalomicin biosynthesis (Figure 8) and, therefore, the 
absence of an eryG homolog would be expected in the meg cluster. The ISJJ36 
element located between eryAI and eryAII (Donadio and Staver, 1993) is not 
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known to play a role in erythromycin biosynthesis and its origin in the ery cluster 
has not been determined. 

Upstream of the common meg/eryBlV and BV genes, the gene clusters 
diverge. The ~~ 6 kb segment between eryBV and eryK, the left border of the ery 
5 gene cluster (Pereda et aL, 1997), contains the remaining genes required for 

mycarose (eryBVJ and BVII) and desosamine biosynthesis (eryCIV, CV, and CVI) 
and the C-I2 hydroxylase (eryK) (Stassi et ai, 1993). In contrast, the region 
upstream of megBV encodes a set of genes (megDI-DVII and megY) which can 
account for all the activities unique to megalomicin biosynthesis (Figure 9). Since 
10 introduction of this meg DNA segment into S. erythraea results in production of 
megalomicins, it is clear that these genes encode the functions for TDP- 
megosamine biosynthesis and transfer to its putative substrate erythromycin C, and 
to acylate megalomicin A (Figure 8). The remaining region upstream of megDVI 
should therefore encode genes only for mycarose and desosamine biosynthesis. 
1 5 Olano et aL (Olano et al, 1999) have recently described a pathway for 

biosynthesis of TDP-L-daunosamine, a deoxysugar component of the antitumor 
compounds daunorubicin and doxorubicin produced by Streptomyces peucetius. 
Their pathway proposes four steps from the intermediate TDP-4-keto-6- 
deoxyglucose controlled by the gene cluster dnmJQTUVZ, although the functions 
20 for dnmO and dnmZ could not be identified and the precise order of reactions in 
the pathway could not be determined. The genes dnmT, dnmU, dnmJ and dnmV 
each have proposed counterparts in the meg cluster, megT^ megDIV, megDII, and 
megDV^ respectively (see Figure 10) 

It is possible to describe a pathway to convert TDP-2,6-dideoxy-3,4- 
25 diketo-D-hexose (or its enol tautomer), the last intermediate common to the 

mycarose and megosamine pathways, to TDP-megosamine through the sequence 
of 5-epimerization 5 4-ketoreduction, 3-amination, and 3-7V-dimethylation 
employing the genes megDIV, megDV, megDII, and megDUL This employs the 
same functions proposed for biosynthesis of TDP-daunosamine by Olano et aL, 
30 but in a different sequential order. However, it does not account for the megDVI 
and megDVII genes since their activities are not required for this route. A parallel 
pathway which employs these genes is also shown in Figure 10. In this alternate 
route, 2,3-reduction and 3,4-tautomerization are performed by the megDVII and 
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megDVI gene products, respectively. A unified single pathway that employs both 
4-ketoreduction {megDV) and 2,3-reduction {megDVII) could not be determined. 
Because the entire gene set from megDVI through megDV II was introduced in S. 
erythraea to produce TDP-megosamine, it is not possible to determine which, if 

5 either, of the two alternative pathways is operative, but this can be addressed 
through systematic gene disruption and complementation. 

The 48 kb segment sequenced also contains genes required for synthesis of 
TDP-L-mycarose and TDP-D-desosamine (Fig 10). For the latter, megCIU which 
encodes a putative 3,4-isomerase, the first step in the committed TDP-desosamine 

0 pathway, appears to be translationally coupled to megAIII, almost exactly as its 
erythromycin counterpart, eryCII, was found translationally coupled to eryAIII 
(Summers et ai, 1997). The high degree of similarity between MegCII and EryCII 
suggests that the pathway to desosamine in the megalomicin- and erythromycin- 
producing organisms are most likely the same. Similarly, the finding that megBII 

5 and megBIV, encoding a 2,3-reductase and 4-ketoreductase, contain close 

homologs in the mycarose pathway for erythromycin also suggests that TDP-L- 
mycarose synthesis in the two host organisms is the same. 

Of interest are the two genes that encode putative 2,3-reductases, megBII 
and megDVIL Because MegBII most closely resembles EryBil, a known mycarose 

0 biosynthetic enzyme (Weber et al, 1 990), and because megBII resides in the same 
location of the meg cluster as its counterpart in the ery cluster, megBII is assigned 
to the mycarose pathway and megDVII to the megosamine pathway. Furthermore, 
the lower degree of similarity between MegDVII and either EryBII or MegBII 
(Table 2) provides a basis for assigning the opposite L and D isomeric substrates 

5 to each of the enzymes (Figure 10). Finally, megT, which encodes a putative 2,3- 
dchydratase, is also related to a gene in the ery mycarose pathway, eryBVL In S. 
erythraea, the proposed intermediate generated by EryBVI represents the first 
committed step in the biosynthesis of mycarose (Figure 10). However, the 
proposed pathways in Figure 10 suggest this may be an intermediate common to 

0 both mycarose and megosamine biosynthesis in M megalomicea. Therefore, megT 
is named following the designation of the equivalent gene in the daunosamine 
pathway, dnmT (Olano et aL, 1999) 
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The preferred host-vector system for expression of meg DEBS described 
here has been used previously for the heterologous expression of modular PKS 
genes from the erythromycin (Kao et al^ 1994a; Ziermann and Betlach, 1999), 
picromycin (Tang et ai 9 1999) and oleandomycin pathways, as well as for the 
5 generation of novel polyketide backbones where domains have been removed, 
added or exchanged in various combinations (McDaniel et al^ 1999). Recently, 
hybrid polyketides have been generated through the co-expression of subunits 
from different PKS systems (Tang et al , 2000), 

Expression of the megDVI-megDVll segment in S. eryihraea and the 

10 corresponding production of megalomicins in this host establishes the likely order 
of sugar attachment in megalomicin synthesis. Furthermore, it provides a means to 
produce megalomicin in a more genetically friendly host organism, leading to the 
creation of megalomicin analogs by manipulating the PKS. Over 60 6-dEB 
analogs have been produced by combinatorial biosynthesis using the ery PKS 

] 5 (McDaniel et al, 1 999; Xue et al. , 1 999). The titers of megalomicin could also be 
significantly increased above the 5 mg/L obtained from M. megalomiciea by 
introducing the genes into an industrially optimized strain of S. eryihraea, many of 
which can produce as much as 10 g/L of erythromycin. 
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Example 2 

Stabilizing meg PKS Expression Plasmid by Codon Engineering 

30 Materials and methods 

All bacterial strains were cultured and transformed as described in 
Example 1 . 
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Fermentation of Streptomyces and dike tide feeding 

Primary Streptomyces transformants were picked and placed in 6 mL of 
TSB liquid medium with 50 ng/L of thiostrepton and grown at 30°C. When the 
5 culture showed some growth (3-4days), it was transferred into a 250 mL flask 

containing 50 mL of R6 medium (pH 7.0) with 25 ug/L of thiostrepton and 1 g/L of 
diketide ((2s ? 3R)2-methyl-3-hydroxyhexanoate N-propionyl cysteamine thioester) 
and placed in a 30°C incubator for 7 days. 



1 0 Changing codons and making plasmids 

There are several identical sequences in the coding sequences for module 2 

and module 6 of the megalomicin PKS gene cluster. Expression plasmids 

containing the foil length megalomicin PKS appeared to be somewhat unstable 

+ 

and subject to deletion in recA strains like ET1 24567 and Streptomyces by intra- 
1 5 plasmid homologous recombination. To prevent significant homologous 

recombination and so stabilize expression plasmids, the codons of two regions of 
the module 6 coding sequence that are identical to regions in the module 2 coding 
sequence were changed without changing the sequence of protein encoded. The 
two regions changed in module 6 were from the 26739 base to 27,267 base and 
20 from position 27,697 th base to 27,987 th base, which were identical to the region 
from position 6810 th base to 7338 lh base and regions from position 7778 th base to 
8068 th base, respectively. The start codon of the loading domain of the meg PKS 
was set to be the l sl base. These sequences are shown below 



25 > 6810-7338 Sequence in Module 2 

TTGCAGCGGTTGTCGGTGGCGGTGCGGGAGGGGCGTCGGGTGTTGGGTGTGGTGGTGGGT 
TCGGCGGTGAATCAGGATGGGGCGAGTAATGGGTTGGCGGCGCCGTCGGGGGTGGCGCAG 
CAGCGGGTGATTCGGCGGGCGTGGGGTCGTGCGGGTGTGTCGGGTGGGGATGTGGGTGTG 
GTGGAGGCGCATGGGACGGGGACGCGGTTGGGGGATCCGGTGGAGTTGGGGGCGTTGTTG 

30 GGGACGTATGGGGTGGGTCGGGGTGGGGTGGGTCCGGTGGTGGTGGGTTCGGTGAAGGCG 
AATGTGGGTCATGTGCAGGCGGCGGCGGGTGTGGTGGGTGTGATCAAGGTGGTGTTGGGG 
TTGGGTCGGGGGTTGGTGGGTCCGATGGTGTGTCGGGGTGGGTTGTCGGGGTTGGTGGAT 
TGGTCGTCGGGTGGGTTGGTGGTGGCGGATGGGGTGCGGGGGTGGCCGGTGGGTGTGGAT 
GGGGTGCGTCGGGGTGGGGTGTCGGCGTTTGGGGTGTCGGGGACGAAT (SEQ ID NO: 23) 

35 > 26736-27267 Sequence in Module 6 

CTGCAGCGGTTGTCGGTGGCGGTGCGGGAGGGGCGTCGGGTGTTGGGTGTGGTGGTGGGT 
TCGGCGGTGAATCAGGATGGGGCGAGTAATGGGTTGGCGGCGCCGTCGGGGGTGGCGCAG 
CAGCGGGTGATTCGGCGGGCGTGGGGTCGTGCGGGTGTGTCGGGTGGGGATGTGGGTGTG 
GTGGAGGCGCATGGGACGGGGACGCGGTTGGGGGATCCGGTGGAGTTGGGGGCGTTGTTG 

40 GGGACGTATGGGGTGGGTCGGGGTGGGGTGGGTCCGGTGGTGGTGGGTTCGGTGAAGGCG 
AATGTGGGTCATGTGCAGGCGGCGGCGGGTGTGGTGGGTGTGATCAAGGTGGTGTTGGGG 
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TTGGGTCGGGGGTTGGTGGGTCCGATGGTGTGTCGGGGTGGGTTGTCGGGGTTGGTGGAT 
TGGTCGTCGGGTGGGTTGGTGGTGGCGGATGGGGTGCGGGGGTGGCCGGTGGGTGTGGAT 
GGGGTGCGTCGGGGTGGGGTGTCGGCGTTTGGGGTGTCGGGGACGAAT (SEQ ID NO: 24) 
> 26736-27267 Sequence with- Codon Changes 
5 CTGCAGCGCCTCTCCGTCGCCGTCCGCGAGGGCCGCCGAGTCCTCGGCGTCGTCGTCGGC 
TCGGCCGTCAACCAAGACGGCGCGTCAAACGGCCTCGCCGCGCCCTCCGGCGTCGCCCAG 
CAGCGCGTCATACGCCGCGCGTGGGGACGCGCCGGAGTATCGGGCGGCGACGTCGGAGTC 
GTCGAGGCCCACGGCACCGGCACCCGCCTCGGGGATCCCGTCGAGCTGGGCGCCCTCCTG 
GGCACGTACGGCGTCGGCCGCGGCGGCGTCGGCCCGGTCGTCGTCGGCAGCGTCAAGGCC 
] 0 AACGTCGGCCACGTCCAGGCCGCGGCCGGCGTCGTCGGGGTCATCAAGGTCGTCCTCGGC 
CTCGGCCGCGGGCTGGTCGGCCCGATGGTCTGCCGCGGCGGCCTCAGCGGCCTCGTCGAC 
TGGTCGTCCGGCGGCCTGGTCGTCGCGGACGGGGTCCGCGGCTGGCCGGTCGGCGTCGAC 
GGCGTCCGCCGGGGCGGCGTCTCGGCGTTCGGCGTCAGCGGGACGAAT (SEQ ID NO: 25) 



15 > 6978-7337 Sequence in Module 2 

GGTGGAGTGTGATGCGGTGGTGTCGTCGGTGGTGGGGTTTTCGGTGTTGGGGGTGTTGGA 
GGGTCGGTCGGGTGCGCCGTCGTTGGATCGGGTGGATGTGGTGCAGCCGGTGTTGTTCGT 
GGTGATGGTGTCGTTGGCGCGGTTGTGGCGGTGGTGTGGGGTTGTGCCTGCGGCGGTGGT 
GGGTCATTCGCAGGGGGAGATCGCGGCGGCGGTGGTGGCGGGGGTGTTGTCGGTGGGTGA 

20 TGGTGCGCGGGTGGTGGCGTTGCGGGCGCGGGCGTTGCGGGCGTTGGCCGG (SEQ ID NO: 
26) 

> 27697-27987 Sequence in Module 6 

GGTGGAGTGTGATGCGGTGGTCTCGTCGGTGGTGGGGTTTTCGGTGTTGGGGGTGTTGGA 
GGGTCGGTCGGGTGCGCCGTCGTTGGATCGGGTGGATGTGGTGCAGCCGGTGTTGTTCGT 
25 GGTGATGGTGTCGTTGGCGCGGTTGTGGCGGTGGTGTGGGGTTGTGCCTGCGGCGGTGGT 
GGGTCATTCGCAGGGGGAGATCGCGGCGGCGGTGGTGGCGGGGGTGTTGTCGGTGGGTGA 
TGGTGCGCGGGTGGTGGCGTTGCGGGCGCGGGCGTTGCGGGCGTTGGCCGG (SEQ ID NO: 
27) 

> 27697-27987 Sequence with Codon Changes 

30 CGTGGAGTGCGATGCGGTCGTGTCGAGCGTCGTCGGCTTCAGCGTGCTGGGCGTCCTGGA 
GGGCCGCAGCGGCGCCCCGAGCCTGGACCGCGTCGACGTGGTCCAGCCGGTCCTGTTCGT 
GGTCATGGTCAGCCTGGCCCGCCTGTGGCGCTGGTGCGGCGTGGTCCCGGCCGCCGTGGT 
CGGCCACAGCCAGGGCGAGATCGCCGCCGCGGTCGTGGCCGGCGTCCTGAGCGTCGGCGA 
CGGCGCCCGCGTCGTGGCCCTGCGCGCCCGCGCCCTGCGCGCCCTGGCCGG (SEQ ID NO: 

35 28) 



Three pieces of DNA from the two regions above were synthesized and verified by 
Retrogen, and the synthesized DN As were cloned into pCR-Blunt II -TOPO, as 
shown in the Table 3 below. 

40 



Table 3. Plasmids containing synthesized DNA 



Plasmids 


Cloning sites and positions in meg PKS 


pKOS97-1613 


Pstl-BamHI, 26,739 th -26,947 th base 


PKOS97-1622 


BamHl-Bsml, 26,947 th -27,267 th base 


PKOS97-1628 


SfaNI-Fsel, 27,697 th - 27,987 th base 



Assembly of the expression plasrnid 

First, ligation of the Pstl-BamHI fragment of pKOS97-l613, the BamHl- 
45 BsmI fragment of pKOS97-1622 and Bsrnl-Pstl linearized pKOS97-90 produced 

99 



5DOCID: <WO. 



„0127284A3 IA> 



WO 01/27284 



PCT7US00/27433 



pKOS97-l51. Then, the insertion of the SfaNl-Fsel fragment of pKOS97-l628 
into pKOS97-151 gave rise to pKS097-152. Then, the Pstl-Bipl fragment of 
pKOS97-125 was used to replace the Pstl-BlpI fragment of pKOS97-90a and 
produced pKOS97-160. 
5 The final expression plasmid (in pRM5) pKOS97-162 was the result of 

Bglll-Nhel fragment of pKOS97-160 inserted into Bglll-Nhel sites of pKOS108- 
04. 

Another expression plasmid pKOS97-152a was made by a four-fragment 
ligation. The four fragments were a Blpl-Xbal fragment (containing a cos site) of 
10 pKOS97-92a, a Bglll-PstI fragment of pKOS97-81, a Pstl-Blpl fragment of 
pKOS97-152, and a Bglll-Xbal fragment of pKOS 108-04 (as the vector). 

Tests of the constructed plasrnids showed that the plasmids containing the 
modified coding sequences were more stable than plasmids containing unmodified 
coding sequence. 

15 

Example 3 
Construction of Ole-Meg Hybrid PKS 
Construction of pM41 -based pKOS098-48 for the expression of OlePKS modules 
1-4. 

20 The 240-bp fragment containing the 3' -end portion of oleAH gene (at nt 

1 1210-1 1452; the first base of the start codon of oleAII is nt 1) was PCR amplified 
with primers N98-38-1 (5 ' G AAC AACTCCTGTCTGCGGCCGCG-3 ') (SEQ ID 
NO: 29) and N98-38-3 (5'- 

CGGAATTCTCTAGAGTCACGTCTCCAACCGCTTGTCGAGG-3 3 ) (SEQ ID 
25 NO: 30). The fragment contains a naturally occurring NotI site at its 5 ? -end and 
the engineered Xbal (bold) and EcoRl sites (underline) at its 3'-end following the 
oleAII stop codon. pKOS38-189 was digested with EcoRI and NotI to give five 
fragments of 8 kb, 5 kb, 4 kb, 2.5 kb and 2 kb. The 8-kb EcoRl-NotI fragment 
containing oleAII gene nt 2961 to nt 11210 and the 240-bp NotI, EcoRI treated 
30 PCR fragment were ligated into litmus 28 at the EcoRI site via a three-fragment 
ligation to give pKOS98-46. The 8.2-kb EoRI fragment from pKOS98-46 was 
cloned into pKOS38-174, a pRMl derived plasmid containing oleAI and nt 1 to nt 
2960 of oleAII to give pKOS98-48. 
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Construction of pS£TJ52-based pKOS98-60 for the expression of megPKS 
modules 5-6. 

The 360-bp fragment containing nt 1 to nt 366 of megAIIIwas PGR 

5 amplified with primers N98-40-3 (5*- 

TCTAG A CTTAATTA A GG AGG ACAC4 TA TGAGCG A-GAGCAGC- 
GGCATGACCG^') (SEQ ID NO: 31) and N98-40-2 (5 5 - AACGCCTCCCAG- 
GAG ATCTCC AGC A~3 ') (SEQ ID NO: 32). A Pad site and a Ndel site as well 
as the ribosome binding site were introduced at the 5'-end of the megAI start 

10 codon. The 360-bp PacI-BglU fragment was inserted into pKOS 108-06 replacing 
the 22-kb Pacl-Bglll fragment to yield pKOS98-55. The 1 0-kb Pacl-Xbal 
fragment containing megAIII gene and the annealed oligos N98-23-1 (5 5 - 
AATTCATAGCCTAGGT-3') (SEQ ID NO: 33) and N98-23-2 (5'- 
CTAG ACCTAGGCTATG-3 ') (SEQ ID NO: 34) were Ugated to Pad and EcoRI 

1 5 treated pSETl 52 derivative pKOS98-l 4 via a three- fragment ligation to give 
pKOS9S-60. 

Example 4 

Conversion of Ervthronolides to Erythromycins 
20 A sample of a polyketide (-50 to 100 mg) is dissolved in 0.6 mL of 

ethanol and diluted to 3 mL with sterile water. This solution is used to overlay a 
three day old culture of Saccharopolyspora erythraea WHM34 (an eryA mutant) 
grown on a 100 mm R2YE agar plate at 30°G. After drying, the plate is incubated 
at 30°C for four days. The agar is chopped and then extracted three times with 100 
25 mL portions of 1% triethylamine in ethyl acetate. The extracts are combined and 
evaporated. The crude product is purified by preparative HPLC (C-18 reversed 
phase, water-acetonitrile gradient containing 1% acetic acid). Fractions are 
analyzed by mass spectrometry, and those containing pure compound are pooled, 
neutralized with triethylamine, and evaporated to a syrup. The syrup is dissolved 
30 in water and extracted three times with equal volumes of ethyl acetate. The 
organic extracts are combined, washed once with saturated aqueous NaHCO^ 
dried overNa2S04> filtered, and evaporated to yield -0.1 5 mg of product. The 
product is a glycosylated and hydroxylated compound corresponding to 
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erythromycin A, B, C 5 and D but differing therefrom as the compound provided 
differed from 6-dEB. 

Example 5 

5 Measurement of Antibacterial Activity 

Antibacterial activity is determined using either disk diffusion assays with 
Bacillus cereus as the test organism or by measurement of minimum inhibitory 
concentrations (MIC) in liquid culture against sensitive and resistant strains of 
Staphylococcus pneumoniae. 

10 

Example 6 
Evaluation of Antiparasitic Activity 
Compounds can initially screened in vitro using cultures of P. falciparum 
FCR-3 and Kl strains, then in vivo using mice infected with P. berghei. Mammalian 
15 cell toxicity can be determined in FM3A or KB cells. Compounds can also be 

screened for activity against P. berhei. Compounds are also tested in animal studies 
and clinical trials to test the antiparasitic activity broadly (antimalarial, 
trypanosomiasis and Leishmaniasis). 

2<> The invention having now been described by way of written description 

and example, those of skill in the art will recognize that the invention can be 
practiced in a variety of embodiments and that the foregoing description and 
examples are for purposes of illustration and not limitation of the following 
claims. 

25 
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Claims 

1 . An isolated nucleic acid comprising a nucleotide sequence 
encoding a domain of megalomicin polyketide synthase (PKS) or a megalomicin 
modification enzyme. 

5 

2. The isolated nucleic acid of claim 1, which encodes a PKS open 
reading frame (ORF) selected from the group consisting of megAI, megAlI and 
meg All I. 

10 3. The isolated nucleic acid of claim 1, wherein the PKS domain is 

selected from the group consisting of a TE domain, a KS domain, an AT domain, 
an ACP domain, a KR domain, a DH domain, and an ER domain. 

4. The isolated nucleic acid of claim 1, wherein the nucleic acid 

15 comprises the coding sequence for a loading module, a thioesterase domain, and 
all six extender modules of megalomicin PKS. 

5. The isolated nucleic acid of claim 1, which encodes a megalomicin 
modification enzyme that is involved in the conversion of 6-dEB into a 

20 megalomicin. 

6. The isolated nucleic acid of claim 5, which encodes a megalomicin 
modification enzyme that is involved in the biosynthesis of mycarose, 
megosamine or desosamine. 

25 

7. The isolated nucleic acid of claim 1 , wherein the nucleic acid 
codons of homologous regions within the PKS or the megalomicin modification 
enzyme coding sequence have been changed to reduce or abolish the homology 
without changing the amino acid sequences encoded by said changed nucleic acid 

30 codons. 



ISOOC1D: <WO 



0127284A3 IA> 



WO 01/27284 



PCT/US00/27433 



8. The isolated nucleic acid of claim 1, which isolated nucleic acid 
fragment hybridizes to a nucleic acid having a nucleotide sequence set forth in the 
SEQ. IDNO:l. 

5 9. A polypeptide, which is encoded by the isolated nucleic acid 

fragment of claim 1 . 

1 0. A recombinant DNA expression vector, comprising the isolated 
nucleic acid of claim 1 operably linked to a promoter. 

10 

11. A recombinant host cell, comprising the recombinant DNA 
expression vector of claim 10. 

12. The recombinant host cell of claim 1 1 , which is a Streptomyces or 
15 Saccharopolyspora host cell. 

13. A recombinant host cell of claim 1 1, which comprises: 

a) at least two separate autonomously replicating recombinant DNA 
expression vectors, each of said vectors comprises a recombinant DNA compound 

20 encoding a megalomicin PICS domain or a megalomicin modification enzyme 
operably linked to a promoter; or 

b) at least one autonomously replicating recombinant DNA expression 
vector and at least one modified chromosome, each of said vector(s) and each of 
said modified chromosome comprises a recombinant DNA compound encoding a 

25 megalomicin PKS domain or a megalomicin modification enzyme operably linked 
to a promoter. 

14. A hybrid PKS that comprises a polypeptide of claim 9 and is 
composed of at least a portion of a megalomicin PKS and at least a portion of a 

30 second PKS for a polyketide other than megalomicin. 
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1 5. The hybrid PKS of claim 14, wherein the second PKS is selected 
from the group consisting of a narbonolide PKS, an oleandolide PKS, and a DEBS 
PKS. 

5 16. The hybrid PKS of claim 1 5 that is composed of the megAI and 

megAII gene products and the oleAIIl gene product. 

1 7. The hybrid PKS of claim 16, wherein the KS domain of module 1 
of the megAI gene product has been inactivated by mutation. 

10 

18. A method of producing a polyketide, which method comprises 
growing the recombinant host ceil of claim 1 1 under conditions whereby the 
megalomicin PKS domain encoded by the recombinant expression vector is 
produced and the polyketide is synthesized by the cell, and recovering the 

1 5 synthesized polyketide. 

19. A recombinant host cell that comprises a recombinant expression 
vector that encodes a megalomicin modification enzyme. 

20 20. The recombinant host cell of claim 19 that produces megosamine 

and can attach megosamine to a polyketide, wherein said host cell, in its naturally 
occurring non-recombinant state cannot produce megosamine. 
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U O tx> Cn tn U & U tntT^U U U tn fd O U - Cn tn 4-> O 

u-Pfd-PUfd-PUtnuu-PfdUfd-P-PU-Pu&rdU 

tnu&U fd U tr> U U u U tn U tn u Otnoo Cn 

&4->UfdfdU4^fdUUU-PUtnUU-Pfd-p.ptnUt^ 
O fd U tn U 03 O tn tr» tr> U -P U tn Cn U -P tr> U fd U 

4^D^UUt^Utnfdt^D^UUtF>OUUtT>UUt^Utn-P 

tjitnO-P-M U-P tn U 4-> -P U -P U cd U -P O t7> U U tn 

U ^ 4J fd tr» tn tJ* rd fd U -P U t^O tn U tr> O tn 

U tn U fd t7> tr» U Otntnuu U U -P U -P £n U U 

fU O O tn tn U fd tr> tr> tr> ty» -P tJ> O U U U U U 

UtnUt7>UUUUUUUer>UUUUUUUtn-PtnU 
O O O-P fd tn-P-P fd tn-P-P fd -p -P U-P u O fd tP fd 
CntntPfd tJiO-P tn Cn tn U tn tr> -P tn fd -P -P 

tntnO Cn U CntnO tn o tn U Cn u u O Cntnt^cd tn 

-P O O U fd -P U-P cd cnrd-P-P tnt^t^+J-p t^O fd O 

Ofd(dt^t^UOtr>U(dOtX>CnO-POfdOOtT>OCnfd 
t^UOt^Ot7»^t7>t^Ut^OtJ»fdOUOOt^O(dOtn 
U O tn U tnU-P U U tr> fd U fd tn-P-P fd U u fd -P O 

-P -P tn tn 4J tntndn-P tn Cr> O tn U tr> t7> tn tr> fd tr» 
U O in tn tn O U U-P tr> U U tn tT> -P U-P tn U tn 

-P tn fd -P tr> U U U U Cn U U tn tn fd tnt^-P -P U U 
UUtT»^^UCr>rdt^triUU&UUUD^UUtn^UCn 

U fd -P fd D^u fd fd U -P -P fdCnfdUfd-PCnOftfd tPfd-p 

tr> tn U U txi U tn tn fd fd Cn u OdJUUt^^U-P-P U 

UU OiCnCntnU tri U tn U tnUUUU-P UU U t7> t7» 

-P U U Cn -P tr» 4-> 4-J U fd -P 4-» tn -P tn .p rd-P tn tn 

tJ> tn -P U U tr> Cn fd U t7> fd U CnU-P U-P-P-p U-P 

UUUfdtJitJ^^triUUUUUt^fdtTitnUUtriUtnU S 

fdUfdtn-P4-)-PtnUUUUUUU-PUfdfdUtP4-)U » 
tr> U tnt^tT>U tn tn tx> -P tr> fd O rd tr> tn tn U U fd 

U tntPGntnU U U tr> O tr> U tnu Gnu tn tn U tr« tx> U U 

tn tn tn U -P 4-> t7> -P ty> fd C7»t7»U tn-P fd fd tr> P tji^j tn 

tnuUU U tnu C^-PU&tnD^-PUUU-P u t7> -P tJ>U 

-P -P U Cn u U-P U tr> -p -P -P -P en fd Cn fd tnu fd tr> U fd 

U -P -P tx> U tr»tn-P ty> tr> tr> U -P U-P tr> u t^U-P tn 

U tr> fd U U U U tn tn Cn tn tP tn U U -P .tn U -P tntnU U 

fd tx* tn U -P -P tntnu tr> fd -P cn-p-P-P tn 4-) Cr»tr»fd-P4-> 
tr> & U -P t7> U U tnuCntJktntJk-P-Ptnfd tr> -P u tn O 
tJk-P-PU U UU ^ tn O U tntT>Utnt^tntnuuUU tr> 
(dtn&4J-P-PfdD^Utr»fdUfdUUtrifdfd^fd^fdCn 
U fd Cn-p tr>fd tn-P tn fd tr> tr> U -P u Cn&tn-P tn U 

U-PUU tn U ^ U U U UU^tnUUt^O^t^U fd 
fdD^U^UfdUCnuCn-PCn4->fd-PfdfdfdU-PUfdCTi 
tT> U t7> U tr> fd U tn U tn U tn tr> tn -p tr^ Cn tn U tn 
UUUU U U U U-PCnU tr> U U U tr> U U & fd U U tJ« 
UfdU-Pfd-PUU4-»-PUUU4->-PCr»U-PC7>UfdUU 

U -P tr> -P 4-> -P tnrdU-P U fd t7> -P tn fd tj»tn+J-P U 

UtnUUUUUtntT»UU-PUUUUUtn4->UCnUt^ 
rdU-P-P t^fd-P U fd UU tntnfd tnU-PU C^-P U fd fd 
U tr>U tr> t7> tT> U tntT»tnt7>D>-P U tPtPU rji o tnrd 
t^tnfd U U tntntntntnu U U tntr^U U Cr>fd U U U4-> 

fd-pfd-Pfd+->u-P&-PUrdUUUfduuo^tr>uutn 

tr» U tntntrkU fd U t^tnt^fd-P-P U tn-p<d U tn O tn 
O tn U U U U U U tn tn U U U U tnU U U tr»U U 
tnUfdtr»-PfdfdU(dU4-)+-)UU4->-Pt^U-PU4-)-PU 
U^D^UU-PU4JUtnfdtntJ»+->UUUfdU0^UtnU 
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U-P tn U tn U -P tn tn tn tn U tn O tn U tn U O O tn tn fd 

rd tn U tn tn rd tn tn tn -p tn tn u O <t5 4-> rd tn fd .p tn & 4-* 

tn tn U tn u fd U tn U U tn -P U rd tn .p U rd U fd U tn tn 

tn tn U tn fd tn tn u U tn U U U O O tn U O tn tn u tn tn 

fd-PUUfdfdUrdUtnrdOrdPfdUOtn4->UUU4-> 
tn tn -P 4-> tn tn -P tn U 4-> tn tn u U U tn tn tn rd u rd o O 
O O O tn U tn tn U tn tn O U tn O U U tn tn U a tn U 

4JUrdtnp-p-pUrdtnOUfdU-PU4->UrdUUtnrd 
tn tn U tn -p o O U a O tn rd tn rd U tn O tn tn tn U tn tn 

&UfdUUUtnrdtnUOUUtnUU-PtntnfdUUU 

U tn tn tn O tn tn U -P tn tn tn tn tn fd rd tn -P fd tn U tn O 

tnUUtntnUUUtnUtnUUOUUUUUtntnUU 
U4JU-Ptn-PU-Ptn4->OUrdPtnUrdU4->fdfdfd-P 
tn O tn tn tn tn tn U U tn U rd tn tn U 4-> tn rd tn tn U tn tn 

tntntntnuUUUUUrdUUOUUU-Ptn-PtnUtn 

tn tn o u -P -P -P U tn rd tn U P O O O-p tn tn'tn O O O 

tn U Cnt^^t^CntnCnO tn tn tn rd tn P a tn tn O tn 

O tn tn tn U tn tn U tn tn tn tn U U-P O U tn U U 4-> tn 

O tn U 03 U tnU-PUtntntnU tn tn U fd tn rd rd U tn rd 

tn U O U tn U 4-> to rd OO O fd tn tn -P -P -P -p u fd tn rd 

O tn tn u U-P-P tn o P U O tn U O O U tn tn tn U O tn 

U -P fd U-P tn tn rd tn tn tn u u rd -P -P O a tn -p cd O fd 

P tn tn rd tn tn tn U tn tn U fd tn tn u -P tn o tn -p a tn 

tntntntnUUtnUUUUUUOtnUUtnUtntntnU 

fd P U-P U-P U fd tn .p tn tn -P rd +-> 4-> rd -P tn -P o u tn 

tn U tn u tn U 4-> tn tn tn rd tn u tn tn U fd Di U U tn tn u 

tn U U U U U tn tn O tn U U tn tn U tn U fd O U fd 

•p fd -P fd fd uu fducnO-pp-pfd-p tx> u -P u P> u fd 

U tn tn tn rd tn O tn tn tn tn O U O fd tn tn -P -P tn U tn tn 

tn O P U tn -Cn tn U U O tJi U U tn tn O tn U tn tn U S 
OP O-P rd tn .p £n tn tn p> fd U O U-P O U tn p -p tn tn ! » 

tn tn tn O tn -P tn tn rjn tn O tn tn rrj t^-P U fd U tn U U tn N *" 



tn tn O tn u U U U tn U tn U -P tn tn rj) Q u U £n U -P U 
rd tn -P u O-P fd U U U-P U tn o tn o -P U fd tn 4-> tn .p 
U U-P fd fd U tn tn tn fd tn tn o tn O tn tn tn tn o -P U tn 

tnotnUOtnOOtnOtnOOOOOOtnOOtnOO 
tn tn p rd -P O-P tn tn tn -P -P -P o fd rd rd O O-p tn o u 
O tn O tn tn tn tn tn rd tn o U U tn rd tn -P U tn o U tnu 
tntnotntnOtnoUUUrdtnUUUUUU-PUtntn 

U U tnU-P tn p U tn tn tn rd fd -P U-P tn -P -P tn rd rd -P 
tn rd -P -P tn tn U -P P> tn -P tn u tn tn u tn u tn tn tn tn U 
tn tn o tn tn O O tn tn U tn O rji o tn o U fd tn tn tn U tn 
Otnpptnfdtn^tnuoOfdrdtnmtJitnUfdtnfdtn 
tn U U-P-P u tn U -P rd O fd tn O U tn tn tn o tn -P tn tn 

tnotntnooootnOOOtnOrdOOtn-POtnOO 
rdOPOPOtnOtn^OPtnprdO-Prdtnptn-PO 
tn o tn -p rd tn tn rd U tn rd O UP tn tn u tn tn tn fd U tn 

OtnOtntntntntnooOrdOtnootnOtnoootn 
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t^OiOOOt^OtnPfdtnrOtFiOOtnoOOOOtnO 
GO O O-P O cO Cn tJ> O tr> O O O -P -P tr> -p <cS cd 

Oc0OOOOO^4->c0OOc0t^O-Pt7>OOr0OtritnO 
<0 cd U tno tr> tn cO -P t7> rti P On U Cn Cn tr> O t7^-P tn 
O tn O tn o tn tn tr> -P O U tr> P O ^ tn tn U tn O co 

coocoootr>tnotriOtri.pooooooooooo 

a tn -P a cO -P tn co 4-> tn tn tr> O O P -P tn o cO u u 
tn -p tTi tJi U fO O tr> -P tn O tn tn t7> tn O -P tn rd O O crt tn 
OtnOOOOOOtntnOtnOtntntntnOOOOOO 

tr» P P P cd tn O cO (J O c0 tn P tr> O O-P O a co cO-P-P 
tn tn o P tn tn tn tn cO -P tn p> tn O tn tn tn o co tr* tn O O 
OtntnOOOOUtnt^Otn U P D^otnOOtnotno 
O-P O co tr> -P D^-P tn -P cO P D^P O -P -P O-P tn .p 
O tn tn tn tr> tn O tn tn -p o tn O O O cO O 4-> cO O -p o O 

tnOOtnOtn-PtnOOOOOOOOcdtnOOOtnO 
P>OOcd4^tntnO-Pcd-Pt7>OOtri-P-PtnOcdcdcOO 
O tn cO tn O tn O tn U tn O tn <0 cO O tn O P> tn tn O cO 

OOtnotn-POOOOcdtntnOtntntnOOOtntntn 
P>OOO-PtT>cdtnO-Pc0O-Pc0c0OOtr>.OOOc0f0 

^fdOU^tnOtBDi^t^tPUf0U(^O4-) f0 O O O tn 

O O tn O tn tn tn O O O O " tn tn O -P tn U O O O O tn tn 

P O P f0 cO tn cd O cO cO cO O-PPtnOOtntncO OO O 

O tn .p tn tn P> tn o tn tn tn O -P co O O cd tn tn O tn O O 

C04-J tn a tn O O tn O O O tn tn tn tn O tn O tn O ^ O O 

tn tn cO O O-PP-P O O-P cO O cOP cO O-P cO cO ' tn cO cO 

O tn tn tn tn tn tn O tn tn P> tn co tn O O O co tn -P tn tn tn 

O O tn O tn t7> tn O tn O O tn tn O O O O O O O O tn O 

O O tn co -P tn P O-P-P-P tn co O O cO cO tn O P cO-P-P 

cO tn tn tn O tn O tn tn tn O tn tn cO tn tn tn tn p> o tn cd cO 

OOOOOOOtnOOtntntnOtncOOOtntnotnO £2 

O0tn04JO0tnrdO00tntn.p-P0P>.Ptn-P0tn 7 

tntnt^Cnt^tntn-P tn tn O O O CnrOP-P tn tn O O tn tn ^ 

PD^O O tn tn O tn O tJ^ U tn tn O tn O O tn tn tn o tn tn r ^ 

tn tn -p +j O-P co OP-P O tn rd U U-P rd -p -p Cr»-P-P U' S 

tnOcdtncdtntnOOtncdOOcdPtntntntnO-POtn u - 
tn O tn O tn tn O O O Cn tn U cO tn O tn tn O tn O cO tn tn 

tnPOtnOO-POtnOOcdc0r0tnP>tnOtn-PtnOtn 
O co tn tn o -P O rOcO-PO Cn Cn O tn tn o O tn tn O 
cO tn tn O tn tn O tn O tn tn o tn O tn tn O O tn O tn tn <0 

cdOOOP>Ocatntn-Ptn4Jc0c0PO.P4^tnc0tnc0c0 
tn tn co tn U -P P P a tn O tn O O O-P tn tn -P tn tn tn tn 

OOP>OtnOOOtntnOOtncdtnOOOOtntnOtn 
OOr^^otr>OOPOPOOtnOOPOP>tntntn-p 
tn cd O P tn -P tn tn O tn rd O O tn P tn tn O O-P cOO tn 
O tn tn O o O O tn O P> O tn O tn O tn tn cd O O OO tn 
-P P P cO O cO tn -P tn O O OP (OOP cO cO tncOO O 
OOO co cO-P co tn tn tn tn tn .p o tn o O tn tn tn tn O tn 
tn tn cO tn tn O O tn O O tn O tntnO tn f0 O tntJ^O O tn 
Cn-P O O (0 -P CnO Cn O O & P>PP4-> cr»G f0 Cn+J O cd 

-P-Ptnco co Izntn-P O tntntno O co O tntno tn cO & 
tn tn tn tn O O tnO O O tntnO tn 4-J CnO O O O 

^cOcO^OcOOOcO^-PcOOcOD^tntncOOcOcd-Pt^ 
O tr^tnp tnCntntncmO-P tnt^tPtT»0 O tn U C^-P cO P 
U^tnt^tn^OUO O -P Otnt^OtncnO O OO 
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-P U Ct> Cn Cn o O ..O Cn o Cn Cn Cn Cr> O O o O oo Cn tn O 
Cn4-J04-)^4Jfd4-)tr>(U^+Jrdfd OOOOOfdCnO-P 
Cn Cn p p u OO O Cn Cn p> Cn Cn O fd Cn -p fd fd Cn Cn p> Cn 
CnpoC^CnOOOOOOCnOrdCnOCnOOOCnCntn 

tr> O P> Cn O Cn Cn Cn P Cn O Cn Cn o Cn O O tn O O Cn Cn 
CnOCnCnrdOOOOOCnOOfd O-POOOt^CnOO 
tr>CnOPOP>rdrdOP>0000 t7*4->4->0-POCnOrd 

Cn Cn p> O Cn -p cd fd tn O O Cn Cn Cn fd tn tn tn O -P Cn Cn cn 
OOODvO&rdOOrdt^OOCnOCnCnOCnOOOO 

O O Cn O tn Ct» Cn Cn O rd Cn O fd rd 4-> O O 4-> -p o o -P P 

rdP>-P-PrdCnOfdrdCnoCnocnCnordOCnoop>o 
CnOOOOOtnOCnoOP>C*OOfdCnOfdOOtnCn 
P -P O Cn P fd Cn O -P O fd tn -P Cn Cn fd fdfd o Cn .p p> p 
O O Cn p p Cn O Cn O fd tn Cn O O On Cn O Cn Cn O P> Cn O 
tn Cn o Cn o O tr> tr> O O tn o tn O O O O Cn o -P O tn o 

P>POrdrdOOCnOCnCnOOP>-POPCnOrd-prdrd 
O tn rd tntnCnCntntnCnO tn P> O-P O Cn O Di tn 
fd O O-P Cn Cn Cn O Cn Cn O O Cn Cn O O O O Cn tn tn O Cn 

fdtnrdtnpfdrdrdfdOfdOtnp>Or^CnfdCnrdP>otn 

O fd Cn Cn Cn O rd Cn O rd Cn Cn rd Cn -p Cn rd O OO O-PP 
Cn Cn Cn Cn O Cn O Cn Cn Cn Cn O Cn Cn O Cn O O O Cn O OO 
P O-P Cn p o O fd tn tn Cn O O fd O O Cn Cn .p O-P O fd 
O Cn Cn o P Cn p o fd Cn O CnO Cn P> O O tn Cn tn tn P Cn 
OtnCnPOOCnOOOOCnOOtnrdrdOOtnOOO 

fdOOCnoo-Pfd-PfdCnfd-P-P4->fdCnOfd-POfdO 
PPfdOfd+JprdfdOCnoOCnCnCnCnrdCnOOCnrd 
OOCnCnOOCnQCnCnOOrdCnCnOOOOCnOOO 

Ofd-P4->fd-PO-PP>OfdO-pfd CnOfdO-POO-Pfd 
P> fd O OP-PO tn tn Cn -p fd Cn rd Cn Cn O rd -P Cn rd OP 

CnO Cn tn Cn Cn rd O tn Cn tp O O tn O <j CnOP O-P CnCn 3 

fdtnO00OfdOtncd0fdtnCnO0-PCnCnfdCnp>0 2" 
O Cn Cn P O Cn Cn Cn o tn tn O Cn tn tn O O O OCnO OO N *" 

OCntntnoCnoOfdCnCnOOOOrdOCnooOOtn 
Cnp>Ofdrdfd-PCnfdPOCnrdfdCnO-P-P04->OfdO 
Cn -p o O Cn rd O fd Cn Cn Cn Cn O CnO tn Cn O CnO Cn Cn 4-) 

OCnCnCnOOOOtntnOOCnCnoCnooOOO-PCn 
cd Cn Cn p o Cnp tn rcj _lj +j _p Cn O Cn rd -P fd Cn O O Cnp 

fd Cn Cn rd Cn P CnCno rd fd fd O Cn o Cn p> o O Cn Cn O O 

Cn Cn Cn O Cn CP O O Cn O O O Cn Cn O tn O rd 4-> O O O tn 

P>fdfdOPCnPCnord-Pfd-PCnOfdOfdCnop>00 
fd O Cn Cn ro O O O-P Cn O fd fd-P O Cn O Cn Cn Cn p Cn Cn 

OOCnOO&^tnCnotntnCnO OOCnOOOOtntn 

Cn rd Cn-p-p Cn U P> O P O O O fd tn p> p> O O Cn P p) rd 

Cn p -p o Cn Cn Cn O O Cn Cn rd Cn Cn o CnO Cn rd P> Cn rd Cn 

O O CnO CnO CnCnCno Ci rji ^ O CnOOOOOCnoO 

P>tntnrdOfdO-PtncdCnfdP>P> rd-PO-POP>OCnO 

Cn Cn Cn O fd Cn Cn Cn Cn Cn Cn rd O fd tn fd O O Cn Cn Cn Cn Cn 

OOOOtnOOCnOOOCnOO-POtnOtnPCnOO 
P fd P -P 4-> O P O rd O O P -P fd Cn -p tn O P> cn O P tn 

P fd P rd tn tn Cn tn rd tn tn tn Cn o Cn tn Cn rd O Cn o Cn P 

O O O OCnoO Cn Cn O Cn Cn Cn Cn O O O Cn O tn O Cn Cn 
P>4^00tnOtn-POOrdOOOOrdCntnOfdOtntn 

CnO fd Cn Cn Cn Cn O O O Cn P> Cn o O O Cn O fd Cn Cn -p O 

O tn O OO OCnCnOP O U tj» U OO Cn Cn p> Cn O Cn O 

Cn tn rd P> Cn -p rd P O Cn -p Cn rd O O O O O Cn Cn p> fd tn 

tn O P> Cn Cn O tn Cn Cn o fd Cn rd -P O Cn O Cn O O Cn O Cn 

O O O Cn O Cn O tn O Cn Cn Cn O Cn Cn O O Cn o O O tn tn 

Ofd04->OCnOfd04-^OP>-P4->tr»fdOfdOfd-P04-> 
fd tn tn O CnCnCnO CnO O O rd P> -P fd Cn O tn Cn Cn -P fd 

rH t— i t — I * — IrHt — 1 rH t — It — It — It — It — It — It — It — It — It — It — ! < — I r— 1 t— I t — I H 
^(NOO^O^C\](X)^OU)(>]OD^OVDC^aD^OVDfN]CD 

^r-i^aD(^(T>ooHc^cv]roro^ir)LO^^r-oooDCTiCTi 
^^^^^^aDCDCooDO^oDc»a)oocDa)a}(X)(X)a}oooo 

CNCNlCNJf^CNlCS3CM(>JC^f>3CNJCS]f>30^ 
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otnooooootntntnoootnoooooooo 

-P -P O -tn rd Q4-> O tn O rd tn tn rd U rd rd -P -P tn rd tn rd 
■P tn tn tn tn rd rd tn tn o tn U U O O O tn O tn tn tn tn 
tntntntntnotno O rd u tn o U tn tn o -P rd o o O O 

-PO-PrdtnOtnoOfd-Prd-POfd-POtnrd-POOO 
Otn-PO-POOtnrd O rd U fd U U rd tn tn .p tn rd 

tn-ptnotnrd-pooooooootnootntnotntn 

-P tn O rd tn tn tn 4-> O tn-P O -P -P O -P U tn rd O -P -P -P 
tn tn -p tn O tn O O tn O -P tn -P t7> U O-P tn tn -p tn tn -P 
tn tn tn O O O tn O tn tn o tn tn tn tn o O tn tn o U tn 

OO-PrdtnootnO-POrdrdOtnO-PO-P-prdtno 
OO-PtnOtnOOrdOfdtntntn-POOOOOtntnrd 

tn-P tn O tn tn tn tn O O tj> O O tn tn tn o tn o tn 

rd u o 4-3 -P -P tn rd u (d rd rd O O-P cd tn-p-p tn rd -p -p 

U rd tn O O tn rd U-P rd tn tn -P O U O O tn O tn tn o 

U tn -P tn O tn tn rd O O O O tn-P tn O tn tn tn tn O O U 

•P -P tn cd -P -P -P rd cd tn -p O U tn tn O tn -p O -P O U O 

O tn tn tn tn tn rd tn-P O-P tn o O O fd tn O o o tn tn tn 

tn tn -p Cn O tn o O O fd O tn tn tn tn o O tn tn o o O 

-p tn o tn u -p fd+j-p fd tn o tn 4-> fd u o m-p-p a -p o 

tn tn tn O -P O tn tn O -P tn cd tn O O -P fd fd tn-P tn O fd 

O O tn tn O tn O tn tn tn tn tn o O tn O O b U tn a O O 

fdfd-P-Pfd O tn tn O tn tn rd rd fd tnfd-P tn tn o fdOO 

tn -P tn a tn rd O O O-P O tn tn tn rd tn _p o tn tn o rd tn 

tn tn O tn tn O OO OO tn tn O O O-P tn tn O -P otnO 

-P O (d -P -P tn O O-P fd fd rd rd -P rd tn -p tn -p tn-P tn tn 

tn 4-> O tn -P tn tn tn tn -p tn tn O tn tn o tn tn tn tn rd tn tn 

tn tn tn tn tn O tn tn tn tn tn o O O OO tn +j o O tn O O 

tnfdOtnrdO(d-POtn+-)+Jfdfd4->-POtnfdOfdOO 
O O tn a tn O tn tn rd tn O tn tn tn -P tn tn tn tn tn tn rd O 

OtnOOOfdOtnOOOtnOOOtnrdtnOOO tntn 5 
cdtnotnotnOrdtnO-POtn-POtntn-POO-POO * 
tn-P tn rd tn tn tn tn tn tn o tnu tnO tntntntntntntnO ^ 
O tn a o O O tn tn o O O tn o tn O tn o tn tn o tn O O 
rd-P o tn tn rd -P o tn rd tn -p tn tn cd -P tn -p tn O o fd O' 
4-»-P tn O O fd tn o tntnrd fd O tn -p o tn tn tn rd U4-> O 
tn tn o tn O tn tn o o tn o tn tn o o tn o o o tn tn tn tn 
o tn-p-p o-P o o+J-P tn-p-p rd o o fd rd O-P tn o tn 
OOrd4^-Ptntn4^tnfdOOOOfd-POOtn-Potn4-> 
O tn tn tn o tn o o O O O tn tn tn o tn-P tn O tn tn tn tn 



fdOfdooo-PfdOO-P-PO-POOtnrdO-PtnOfd 
tn tn tn tn rd tn O o tn tn tn o rd tno-P tn o fd tnO-P O 
o tn tn tn U tn o tn O rd O tn tn O tn o o o o tn tn tn <d 

0- P tn-P tn O 4-> O O O O-P O rd-P O O O-P-P tn tn O 
OOtntntntn+->-PrdrdtntnOOOrdOrdOOtn-PrrJ 
O tn tn tn O tn tn tn o o O -p tn tn O o fd tn tn tn o tn O 
rdOrd-P-P>PrdO-POrdO-POrdrd4-J-POOrdtntn 
tn-p o tn tn tn tn tn tn rd O O tn tn tn tn tn O tn O tn o tn 
O tn tn tn tn tn O O O O tn o O tn o O O tn O tn tn O O 
tn-P otnoOfdfdOfdfdfdtntn+JOfdtn-PfdfdOfd 

tn tn -P rd -P tn tn _p rd tn tn o tn tn tn tn tn -P O O tn tn tn 

O tn o O tn O tn O O O O tn O tn tn tn o tn O tn o tn tn 

-P-P fd o -P -P -P tn tn -P -P o O-P-P tn-P cd rd rd fd fd tn 

tn rd o tn tn tn O O tn O -P O cd O O fd tn o tn o tn tn O 

tntn-potntnrdtnOOtnooooOrdtnOtnOOO 
tn -P tn o -P tn rd -P -P -P -P tn -P -P tn tn o rd-P-p o O tn 
tn tn tn tn rd o tn tn o O O fd tn o o O tn O tn o tn tn tn 
-P tn tn O O tn O tn rd O tn O tn O O O O tn tn U O tn O 

tn04^rdtntnOOrdrd-P-POrdtnrdOrd-P-POOtn 
O tntntntntntntntntntntnO cd tn o tn tn O tn tn O tn 

1 - * i"H x — f t—\ x — I l—l rH cH x — I i— I x — I x — I tH tH \ — ! \ — I \ — | | rH l — | , — J J 
^OVDCNJOO^OVOOJOO^O^CNlOa^OVOCNlOO^OVD 

OHHCNicsin^^LOin^hhoooocriooHHcgroro 
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D^tnrdD^CnOPUOOUO 
P004->trifdtntr>4->fdU-P 
O rd O o bv tn tr> o tn4-> 

tj>t^tntnu-p u tn o u tr^ 

4-JfdP>4->fdt^OOOOU4-> 
tr> O ^ U 4J fd O tnCntJk 

tnOPtPUPO-POOfdfd 
tn04-»CJtrt4->^CnfdfdOtn 

Cnrd4->rdfdrdfdOPO-PO 
P cntn4-> tnfd tr» tJ> O rd tntn 

P tnu O O P O rd P P -P & 
tnCnCn&O tnrd O O-P O Cr» 
O O-P U O O Otntr>D>U 

a rd tj> tr> p tj> tji ti> tj> o -p a 

t^Cnt^U tn tn o tr*tJ>tn 
OUOOOOt^OOOU& 
OPJ(dfdOU-POfdPOtn 

tndncn&UfdfdUOfd4->0 

OOPUOOCnOOOtr.-P 

U4-»tnuaofdOfdOfdtn 

tno U t7» tn tn tr» fd tno O O 
tnt^u oooo oofd tnu 

tT^ 4-> 4-> P P t7>0-P O P CnU 
CnCnC^tTkp O tn4-> tn tj» tn 

O-P O O O P O tnPO O 4-> 
O P tr> O O tn £n 4~> 

OOOtnCnOfdc^ntnrdtntn S 

tr>t^POPOfdtTiOOfdO 

O O O O tntnt^O CntnO ^ 

tnOtntnfdOtnOtPO&O 

O rdcdP>OP04->POP-P 

OtnOtnCr»tnOtnCjitn4-)tn u - 

tn tn U o 04-> O tn O 

4->OOOrdtrifdtTiOfdtJ>o 

tr>00 O tno-P tno tr> tr> tr» 

a o 4-^ p o fd tr» tr> £n tr> o -P 

fdOfdtnppPOtJ^POO 

p rd tr> <t5 cd -P tri trt 
OtT»OOOtJ>OtJ>Otr»t^tr> 
fdrdfdOOfdtr»tr»P04->4J 
tnCjktntJ»P CntntnP tn U fd 
tntnt7»OOCTiOOOtnOtn 
tn004->tnrd4-»4->Ot7krdD> 
tntTkO OCnt^O CnD^O tntn 
O tJttnD^O tntr>tT>tT>tnu 
O fd P OtnOO fdCnCnrd4-3 

OOt^OfdOt^tnt^POCn 
O O O CTiO O tntntntn 

OOOOfdO-POfdtn4->tJk 
4->tTitnfdfdtnO<dOtJ»Otn 
OOtntnOtr»tJ»000&0 
P CnO-P czn p p 4-J P fd O tr» 
Cn-P O O tn o O tr> p Cn Cn & 
D^OOtnOOOtr^tnOOO 
fdCr*OtnoO-P+->4-)OfdO 
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O Cn tn cj> tr> t7» D> O rd o tn tr> _p tx> £n O O tno o O tn o 

cd 4-> rd rd O-P O -P tn P> rtfO rd-P fcn 4_) cd O-P O -P 

U U U O tn P rd O U tx> tr> O O fd O tn -p ^ tn O 

O CntntPU tn O U O U tr> O O tr> O U O tn U O 

rd Cn cd -P fd -P ^ -P rd -P Ord tnO O O -P -p o <d 

tn tr> O tjtt7»tr»U tr>t^U fcn rd tn tr> tn rd & tn o o tn 1^ 

O tn cji o tr> fd O tn o tn O tr» o tr> tn tj^ -p U-P u 

cd CnO OP U tr> -P rd fdO tr> O tn O tJ^-P tJ> O 4-> rd <d 

U O O tT>D>tnO CntnU U O O O-P t7> fcn & tn O Cn O 

(doootP4Jtn4->cr»ouooot7»^f0fd4->U(dO-P 

tn fd P OOP tncntnt^cd cd rd rd Cno tn tn O trt tn -p -p 

O -P fcr> P tntTiD^O tr» U Cnt^tnO rd O CJ tr> O u O 

-P CnrdtJifd tnt^O OOfd-P t7> P Ofd tr> O O O 

O OtnotnOOfd rd £n £n O D> O tn O fd rd OtnOOtn 
OtriUtnUOUfdt^aO&OUt^tnU^OOt^Otn 

t^O fd -P U fd tn -P tn tr> O fd rd-P tn O Q _p cn-P4-)fd 

tnCntno a tn rd tr> tr> -P O tn tn o tn&O-P O o <d 

U O O OP O Cn O tn o 4-> O O tn O tx> tn tn tn O tn O 

4-3 O-P rd tn -P -p rd -P O O tn -p rd P> O rd P> -P p> tn 4-> tn 

O tn U tn rjn O fd (7 U O tn O fd O tn tn o O O O & U O 
tn O tJi U U O O O OOO O Otntntntn&tntntntnrd 

O-PPO0OfdOfdO-pfdOPO-POO4->-PCnfdfd 
tr> -P O rd iT> tn O O O tn tn tn -p O-P O rj) rji rj^ u tn tn 
tTi rji O tn tn p O O O tn o tn O O tn O tn o tn tn O O O 

tntnrdrdrdt^Ordtno^OOrdOOtnocdrdtnoo 
O O tn rd tn tn fd O O tn rd tn fd O tnrj^|J fjirj^rjirjirjiO 
tn tn O OO O O rji u rji U O tn tn O tn O tn O tn tn tn 

^rdtntn0O4^P>O0rdO4->tnOrd-P-PtnOO-PP 
O rjn rji U tn rj) o tn O tn -P tn rd O rji o q uo 

O tn O P P O O O-P tn P P 4-> O OP tn tn tn -P rj rj rd 7T 
fd tn rd fd O tn tn O tn tn rd rd O tn P> O tn tn tn tn rd tn O ^ 
tn o O tn o tn O O tn O O tn tn tn tn O tn o O O tn O O 

o -p p fd fd-p o o-p tn o o cj\ o o p> tn o tn rd rd o fd* 

O-P O tn tn O rj) rj^ u O fd P tn tn -P tn-P rd rd-P tn tn tn 

O tn tn o O tn tn O rji rji o O O O tn tn p> O tn O tn O 

rd tn O P P P O O tn tn O fd rd rd-P tn tn tn tn rd O-P O 

O tn p tn -p O O O O O tn tn tn tn p> tn tn O O tn tn tn tn 

rd O O O tn -P O tn tn O O O O O tn O O O tn O tn tn tn 



rd tn rd P O tn O rd tn tn tn -P OPP>rd-PO-PO tn rd O 

fd rd rdP O tn tn O tn rd tn O tn tn O rd O tn O tn rd OP 

O tn tn O O O U tnO O tT» tr> O tn O -P tJ> O -P O 
OfdPPPP)OOP>OP-P-Pfd-POO-POt^fdrdfd 

rmtPO tr> tr» tr> rd O tnO O O rd tr> tr^ rd tji u rd Cr>t^ 

tT> u tnO O rcn O O O tji Q o O tnc^tntnO O O O 

OD^P>tJ>rdP>OPOPtnOP>0-POfdtnOtnOOrd 
O O tntnp tn-P O rd tyiCntnCntnp) tnrd O o CnrdP 
O O tnO tJitnrcnO'O Cn fd tntntno tr> tn O O rcn o tr> O 
rjnOP tT>tPfdPJp tr»P fd O tntr»4->0 Ord rd4-> O rdCn 

<d4-> tPtno Cn-P rjn o tn rji tn u -P tr> tn ^ O tnCntn 
O^OOOtntnOOOO O OtnOO O-POC^CnOO 
OOrdrdtnrdrd-POPP4->OPtnrdOdn4->OtnOO 
tn4-> tr> Cn tr» tT> tn O rd O tnO tr>0 rd -P O rjn -p tno O tn 
tnO O ODitntnt^tnOO tnO O Cnt^&O tTi Cn cd rd O 

0 04-» Otnp* trtrd D^rdrd O OP-P-P-P-P+J O O OP 
rd tn tn a O-P rd tnO Cn tr» fd O Cn O O O-P O rcn tn 
OOOOOOCnoOOOrjnOCnOOrdOOtntn-Ptn 
O-P tnpp-p-P O O rd rd rd rd imCntr>0 tno tri O 
tn rd tnOP rd tn cji rj^ O tJ^ O O rji tj^ rj> tn rrj fd Cntn 
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tn tn tn OO-POtnOtnotnO&OOOtnotnotnO 
Ofdtnotnt7iOOfdtnot^OOtr»tnOCnoooOrd 
O tn & O -P -P O tJi U O U O -P -P -P fd rd O rd O O -P tn 

OfdOOOtntntntn-POOfd-POO-P-Ptn.PtnOtn 
U tn rd P tn rd tr> o tn tn rd O tn U tn tn o tn u u o tr> O 
tJkU-P tn tn tn -P o -P -P tn O O t^ O U tn -p -p -P di 4J +j 
tn o u U U tn o tr^cd in rd tn tji rd tn tn o tn o o tn tn 
O tn tn o tn tn o tn tn p> a tn o tn tn O O O u tn tn rd tn 

rd tn tn p tn tn tn -P tn -P O tn tn o O O tn rd O tn tn tn O 

tn O tn o tn tn O tn tP O tn tn tn tn tn o tn tn rd tn tn tn tn 

rd tnOtntntntntnO rd O tn tn tn tn O O tn tn tn O tn tn 
O tn tn O U tr> tn tn tn O O -P O O O U tn tn tn -p tn Cn O 
tn tn -P U U O-P tn -P rd O O rd O O rd -P P> -p o -P -P -P 
rd tn tn O rd -P O tn O tn tn tn o rd rd tn O O O rd tn rd -P 
OOtFit^tnOOtn^OOOtnoOOOtntnOOtntn 
P -P 4-) -P -P O rd tn tn rd O U CJ -P OO-P-P fd ^ U O O 
O O O tn tn -P -P -P tn O tn rd -P -P tn tn tn O O tn rd -P O 
tnOOtntnOOOOOOOOOtnoOOrdtnOtntn 
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tntnOOrd4^tnrdOOtntn04JOtntn-Prdrd tnOO S 

OOOPP>OOOtntnOO-PtnrdOrdtnO-POrdO ^ 
fd tn o tn tn rd tn tn tn rd tn o O tn rd fd tn o tn tn -p tn tn 

tnOtnOtnOOOOOOtnrdOOOOOtn-POP>tn 

POtn-POrdrdOrdOP>0-P-PtnO.P-POOOtnO 
tn tn tn tn -P tn -P tn tn rd P> tn U tn O U tn tn tn rd tn tn tn 
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tn P o tn O O tn O O tn O fd tn tn O tn O tn tn rd tn O tn 

P tn tn tn -P p> O tnOO-P fd -P fd O-P-P tn rd tn fd O O 
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fdOOfdtnfdtnt^-PtnfdfdOfdP ) OfdOtTifd(dfdfd 
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U tn rd cd Cn -P tn tn rd U-P U U -P -p -p -p o is jj m m j_> 
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£> U -P O U rd -P tn U U U & &U U tn tn tn O U u u tn 
U tn rd U U Cn -P 4-> U 4-> D^mo^-P-POUO-P 

2 2'£iiSti£2l-ti2L CT,u ° u Cn o U tn O fd tn to tn 

2.2.^^ C TS? t ? tr,CnCno ° 10 U O U Cn to -P u O Cn to 

2* £ 2 o-Pt^-p rd -p tn rd u to u <u u cn u to to -p to -p 
q tn rd di tji o O o-p o tn cn tn rd tn rd u cn o to & o n 

2w^t^&22,2 &uoao ^SSu^out£uti 

tn U CD -P to U tn -p -P -P -P tn U tn U -P rd -P O O -P fd rrt 
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O CntoO O O U Cn to O rd-P CnO-P to Cn 4_> to O tn to n ^? 
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CriU U Cn to Cn U Cn rd -P to rd Cn U U to rd to U rd to to C) 

•£ti 2 £-£2 2. U ° ^ u ^ P to to 6 to rd P to U U to 
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SEQUENCE LISTING 

<110> Kosan Biosciences, Inc. 

<120> Recombinant Megalomicin Biosynthetic 
Genes and Uses Thereof 

<130> 300622004740 

<140> To be assigned 
<141> Herewith 

<150> US 60/158,305 
<151> 1999-10-08 

<150> US 60/190,024 
<151> 2000-03-17 

<160> 34 

<170> FastSEQ for Windows Version 4.0 

<210> 1 
<211> 47981 
<212> DNA 

<213> Micromonospora megalomicea 

<220> 

<221> CDS 

<222> (1) - . . (144 ) 

<223> megBVI (megT) , TDP-4 -keto-6-deoxyglucose-2 , 3-dehydratase; 
SEQ ID NO: 2= translated amino acid sequence 

<221> CDS 

<222> (928) . . . (2061) 

<223> megDVI, TDP-4 -ke to- 6-deoxyglucose 3, 4-isomerase, 
TDP-4-keto-6-deoxyhexose 3, 4-isomerase; 
SEQ ID NO: 3- translated amino acid sequence 

<221> CDS 

<222> (2072) . . . (3382) 

<223> megDI, rhodosaminyl transferase (eryCIII homolog) , 
TDP-megosamine glycosyltransf erase; 
SEQ ID NO: 4= translated amino acid sequence 

<221> CDS 

<222> (3462) . . . (4634) 

<223> megG(megY), mycarosyl acyltransf erase, mycarose O-acyltransf erase; 
SEQ ID NO: 5= translated amino acid sequence 

<221> CDS 

<222> (4651) . . . (5775) 

<223> megDII, deoxysugar transaminase (eryCI, DnrJ homolog), 
TDP-3-keto-6-deoxyhexose 3-aminotransaminase; 
SEQ ID NO: 6= translated amino acid sequence 

<221> CDS 

<222> (5822) . . . (6595) 

<223> megDIII, daunosaminyl-N, N-dimethyl transferase (eryCVI homolog); 
SEQ ID NO: 7= translated amino acid sequence 
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<221> CDS 

<222> (6592) . . . (7197) 

<223> megDIV, TDP-4-keto-6-deoxyglucose 3, 5-epimerase (eryBVII, dnmU 
homolog) , TDP-4-keto-6~deoxyhexose 3, 5-epimerase; 
SEQ ID NO: 8= translated amino acid sequence 

<221> CDS 

<222> (7220) . . . (8206) 

<223> megDV, TDP-hexose 4-ketoreductase (eryBIV, dnmV homolog) , 
TDP-4-keto-6-deoxyhexose 4-ketoreductase; 
SEQ ID NO NO: 9= translated amino acid sequence 

<221> CDS 

<222> (8228) . . . (9220) 

<223> megBII-1 (megDVII) , TDP-4-keto-L-6-deoxy-hexose 2, 3-reductase; 
SEQ ID NO: 10- translated amino acid sequence 

<221> CDS 

<222> (9226) . . . (10479) 

<223> megBV, mycarosyl transferase, mycarose glycosyltransf erase ; 
• SEQ ID NO: 11= translated amino .acid sequence 

<221> CDS 

<222> (10483) . . . (11424 ) 

<223> megBIV, TDP-hexose 4 -keotreductase, 

TDP- 4 -keto-6-deoxyhexose 4-ketoreductase; 

SEQ ID NO: 12— translated amino acid sequence 

<221> CDS 

<222> (12181) . . . (22821) 

<223> megAI; SEQ ID NO: 13= translated amino acid sequence 

<221> misc_feature 
<222> (12505) . . - (13470) 
<223> megAI, AT-L 

021> misc_feature 
<222> (13576) . . . (13791) 
<223> megAI, ACP-L 

<221> misc__feature 
<222> (13849) . . - (15126) 
<223> megAI, KS1 

<221> misc_feature 
<222> (15427) . . . (16476) 
<223> megAI, ATI 

<221> misc__f eature 
<222> (17155) . . - (17694) 
<223> megAI, KR1 

<221> misc_f eature 
<222> (17947) . . - (18207) 
<223> megAI, ACPI 

<221> misc_f eature 
<222> (18268) . . . (19548) 
<223> megAI, KS2 

<221> misc feature 
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<222> (19876) ... (20910) 
<223> megAI, AT 2 

<221> misc_f eature 
<222> (21517) ... (22053) 
<223> megAI, KR2 

<221> misc_f eature 
<222> (22318) . . . (22575) 
,<223> megAI, ACP2 

<221> CDS 

<222> (22867) ... (33555) 

<223> megAII; SEQ ID NO: 14= translated amino acid sequence 

<221> misc_feature 
<222> (22957) ... (24237) 
<223> megAII, KS3 

<221> misc_feature 
<222> (24544) . - . (25581) 
<223> megAII, AT 3 

<221> misc^feature 
<222> (26230) . . - (26733) 
<223> megAII, KR3 (inactive) 

<221> misc_feature 
<222> (26998) . . - (27258) 
<223> megAII, ACP3 

<221> miscjfeature 
<222> (27393) ... (28590) 
<223> megAII, KS4 

<221> misc_feature 
<222> (28897) . . . (29931) 
<223> megAII, AT 4 

<221> misc_feature 
<222> (29953) (30477) 
<223> megAII, DH4 

<221> misc_feature 
<222> (31396) . . . (32244) 
<223> megAII, ER4 

<221> misc_feature 
<222> (32257) ... (32799) 
<223> megAII, KR4 

<221> misc_feature 
<222> (33052) ... (33312) 
<223> megAII, ACP4 

<221> CDS 

<222> (33666) ... (43271) 

<223> megAIII; SEQ ID NO: 15= translated amino acid sequence 

<221> miscjfeature 
<222> (33780) ... (35027) 
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<223> megAIII, KS5 

<221> raisc^feature 
<222> (35385) . . . (36419) 
<223> megAIII, ATS 

<221> misc^feature 
<222> (37068) ... (37604) 
<223> megAIII, KR5 

<221> misc_f eature 
<222> (37860) ... (38120) 
<223> megAIII, ACP5 

<221> raisc_f eature 
<222> (38187) . . . (39470) 
<223> megAIII, KS6 

<221> misc_f eature 
<222> (39795) . . . (40811) 
<223> megAIII, AT 6 

<221> misc^feature 
<222> (41406) . . . (41936) 
<223> megAIII, KR6 

<221> misc__f eature 
<222> (42168) . . . (42425) 
<223> megAIII, ACP6 

<221> misc^feature 
<222> (42585) . . . (43271) 
<223> megAIII, TE 

<221> CDS 

<222> (43268) . . . (44344) 

<223> megCII, TDP-4-keto-6-deoxyglucose 3, 4 -isomerase; 
SEQ ID NO: 16- translated amino acid sequence 

<221> CDS 

<222> (44355) . . . (45623) 

<223> megCIII, desosaminyl transferase, desosamine glycosyltransf erase ; 
SEQ ID NO: 17= translated amino acid sequence 

<221>- CDS 

<222> (45620) . . . (46591) 

<223> megBII-2 (megBII) , TDP-4-keto-6-deoxy-L-glucose 2,3 dehydratase, 
TDP-4-keto-6-deoxyglucose 2,3 dehydratase; 
SEQ ID NO: 18= translated amino acid sequence 

<221> CDS 

<222> (46660) ... (47403) 

<223> megH, TEII; SEQ ID NO: 19= translated amino acid sequence 
<221> CDS 

<222> (47411) ... (47980) 

<223> megF, C-6 hydroxylase; SEQ ID NO: 20= translated amino acid sequence 
<400> 1 

ctcgagecga tgctcggcgg cgcggtgggc caaccagtcg tggacgtcgt cggtggcggt 60- 
gggaggtccg ccgtgccgag tcaggaaacg tattgccgat tgtgtggatt ccggagtcgc 120 
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atgaccgttg acccgatccc ccatacgcct ctcccgtgat gtcgtgggcg gtccgtgcgg 180 

taccgcccgg actgacattc gtcgatcaag accccgccca gtgtagggct ccgcccgcga 240 

cgggagaagg tccgtcgaac aacttccggg tgaccggtcg ccggcgtcgg tgaaacgggc 300 

gtcggagcac ccgatcattg ctgtcggtga acttcctaac tgtcggcgcg cacatctttc 360 

tgaccggtgt gttccgtggt atgacgcgtt cccggcccgt ctggaactgt gcgtgggact 4 20 

gaccggttgc ggcgtgtttt cgcccgtttc cgaactgcgg attcgtcgat cgcgcaggtg 480 

ggagcgggtg gctgaccggg atgatctgca atcatggcgc tcaatgacga tctcttgtag 540 

catggtccgc gccgagggtc cgacaggccc gaaacgcccg gcatccagcc tgttcgacga 600 

cgtcgacatc accgtgcaag ccgcgatgac accgacacca cgccatgctg gtgccgcact 660 

ggaagggtgg cgcgatcagg gaaatggccg tgtcactaga cagacgccaa acagctgtcc 720 

gggcctgcgg aaacagcatc gatctgcgtc agccgttcat tgccccggcg gcaccgcctt 780 

ggaaatccgt gccaccggtc gtccgcagtg acgatcgcgg acccgggttt cgagacagca 840 

ggtagtaggc gatgcaggcg tttcgtctcg cgccggacgc gtcgcactag gtggaatccg 900 

tcacagtctt caatccggga gcgttctatg gcagttggcg atcgaaggcg gctgggccgg 960 

gagttgcaga tggcccgggg tctctactgg gggttcggtg ccaacggcga tctgtactcg 1020 

atgctcctgt ccggacggga cgacgacccc tggacctggt acgaacggtt gcgggccgcc 1080 

ggacggggac cgtacgccag tcgggccgga acgtgggtgg tcggtgacca ccggaccgcc 1140 

gccgaggtgc tcgccgatcc gggcttcacc cacggcccgc ccgacgctgc ccggtggatg 1200 

caggtggccc actgcccggc ggcctcctgg gccggcccct tccgggagtt ctacgcccgc 1260 

accgaggacg cggcgtcggt gacagtggac gccgactggc tccagcagcg gtgcgccagg 1320 

ctggtgaccg agctggggtc gcgcttcgat ctcgtgaacg acttcgcccg ggaggtcccg 1380 

gtgctggcgc tcggtaccgc gcccgcactc aagggcgtgg accccgaccg tctccggtcc 14 4 0 

tggacctcgg cgacccgggt atgcctggac gcccaggtca gcccgcaaca gctcgcggtg 1500 

accgaacagg cgctgaccgc cctcgacgag atcgacgcgg tcaccggcgg tcgggacgcc 1560 

gcggtgctgg tgggggtggt ggcggagctg gcggccaaca cggtgggcaa cgccgtcctg 1620 

gccgtcaccg agcttcccga actggcggca cgacttgccg acgacccgga gaccgcgacc 1680 

cgtgtggtga cggaggtgtc gcggacgagt cccggcgtcc acctggaacg ccgcaccgcc 1740 

gcgtcggacc gccgggtggg cggggtcgac gtcccgaccg gtggcgaggt gacagtggtc 1800 

gtcgccgcgg cgaaccgtga tcccgaggtc ttcaccgatc ccgaccggtt cgacgtggac 1860 

cgtggcggcg acgccgagat cctgtcgtcc cggcccggct cgccccgcac cgacctcgac 1920 

gccctggtgg ccaccctggc cacggcggcg ctgcgggccg ccgcgccggt gttgccccgg 1980 

ctgtcccgtt "ccgggccggt gatcagacga cgtcggtcac ccgtcgcccg tggtctcagc 2040 

cgttgcccgg tcgagctgta gaggaagaac gatgcgcgtc gtgttttcat cgatggctgt 2100 

caacagccat ctgttcgggc tggtcccgct cgcaagcgcc ttccaggcgg ccggacacga 2160 

ggtacgggtc gtcgcctcgc cggccctgac cgacgacgtc accggtgccg gtctgaccgc 2220 

cgtgcccgtc ggtgacgacg tggaacttgt ggagtggcac gcccacgcgg gccaggacat 2280 

cgtcgagtac atgcggaccc tcgactgggt cgaccagagc cacaccacca tgtcctggga 2340 

cgacctcctg ggcatgcaga ccaccttcac cccgaccttc ttcgccctga tgagccccga 24 00 

ctcgctcatc gacgggatgg tcgagttctg ccgctcctgg cgtcccgact ggatcgtctg 24 60 

ggagccgctg accttcgccg ccccgatcgc ggcccgggtc accggaaccc cgcacgcccg 2520 

gatgctgtgg ggtccggacg tcgccacccg ggcccggcag agcttcctgc gactgctggc 2580 

ccaccaggag gtggagcacc gggaggatcc gctggccgag tggttcgact ggacgctgcg 2640 

gcgcttcggc gacgacccgc acctgagctt cgacgaggaa ctggtgctgg ggcagtggac 2700 

cgtggacccc atccccgagc cgctgcggat cgacaccggc gtccggacgg tgggcatgcg 27 60 

gtacgtcccc tacaacggcc cctcggtggt gcccgcctgg ctgttgcggg aacccgaacg 2820 

tcggcgggtc tgcctgaccc tcggcggttc cagccgggaa cacggcatcg ggcaggtctc 2880 

catcggcgag atgttggacg ccatcgccga catcgacgcc gagttcgtgg ccaccttcga 294 0 

cgaccagcag ttggtcggcg tgggcagcgt tccggcaaac gtccgtaccg ccgggttcgt 3000 

gccgatgaac gtcctgctgc ccacctgcgc ggccaccgtg caccacggcg gcaccggcag 3060 

ttggctgacc gccgccatcc acggcgtacc gcagatcatc ctctcggacg ccgacaccga 3120 

ggtgcacgcc aagcagctcc aggacctcgg cgcggggctg tcgctcccgg tcgcggggat 3180 

gaccgccgag cacctgcgtg gggcgatcga gcgggttctc gacgagccgg cgtaccgcct 3240 

cggtgcggag cggatgcggg acgggatgcg gaccgacccg tcgccggccc aggtggtcgg 3300 

catctgtcag gacctggccg ccgaccgggc ggcacgcggc aggcagccgc gtcgaaccgc 3360 

cgagccgcac ctgccgcgat gacttccacc accaccggga ccggctgatg ccggtcccgg 3420 

aatccacacg ccgactttcc ttctgacacg agggggcccc ggtggttacc tceaccaact 3480 

tggacacgac agcacggccg gcactgaact cgttgaccgg gatgcggttc gtcgccgcct 3540 

tcctggtctt cttcacgcac gtcctgtcga ggctcatccc gaacagctac gtgtacgccg 3600 

acggcctgga cgccttctgg cagaccaccg gacgggtggg ggtgtcgttc ttctttattc 3660 

tcagcggttt cgtgctgacc tggtcggcgc gggccagcga ctcggtgtgg tcgttctggc -~£720~ 

gcagacgggt ctgcaagctc ttccccaacc acctggtcac cgccttcgcc gccgtggtgt 3780 
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tgttcctggt caccgggcag gcggtgagcg gtgaggcgct gatcccgaac ctcctgctga 3840 

tccacgcctg gttcccggcc ctggagatct ccttcggcat caacccggtg agctggtcgt 3900 

tggcctgcga ggcgttcttc tacctgtgct tcccgctgtt cctgttctgg atctccggta 3960 

tccgcccgga gcggctgtgg gcctgggccg ccgtggtgtt cgccgcgatc tgggcggtac . 4020 

cggtggtcgc cgacctcctg ctgccgagtt ccccgccgct gatcccgggg cttgagtact 4080 

ccgccatcca ggactggttc ctctacacct tccctgcgac gcggagcctg gagttcatcc 4140 

tcgggatcat cctggcccgc atcctgatca ccggtcggtg gatcaacgtc gggctgctcc 4 200 

ccgcggtgct gttgttcccg gtcttcttcg tcgcctcgct cttcctgccg ggtgtctacg 4 260 

ccatctcctc gtcgatgatg atccttcccc tggttctgat catcgccagc ggcgcgacgg 4 320 

ccgacctcca gcagaagcgc accttcatgc gtaaccgggt gatggtgtgg ctcggcgacg 4380 

tctccttcgc gctctacatg gtccacttcc tggtgatcgt ctacggggcg gacctgctgg 4 4 40 

ggttcagcca gaccgaggac gccccgctgg gtctcgcact cttcatgatc attccgttcc 4500 

tcgcggtctc cctggtgctg tcgtggctgc tgtacaggtt cgtcgagcta cccgtcatgc 4 560 

gtaactgggc ccgcccggcc tccgcccggc gcaaacccgc cacggaaccc gaacagaccc 4 620 

cttcccgccg gtaagaagga cggtgcatcg gtgaccacct acgtctggtc ctatctgttg 4 680 

gagtacgaga gggaacgagc cgacatcctc gatgcggtgc agaaggtctt cgccagtggc 4 740 

agcctgatcc tcggtcagag tgtggagaac ttcgagaccg agtacgcccg ctaccacggg 4 800 

atcgcgcact gcgtgggcgt cgacaacggc accaacgctg tgaaactcgc gctggagtcg 4 8 60 

gtaggtgtcg gacgcgacga cgaggtcgtc acggtctcca acaccgccgc ccccacagtc 4 920 

ctggccatcg acgagatcgg cgcccggccg gtcttcgtgg acgtccgcga cgaggactac 4 980 

ctcatggaca ccgacctggt ggaggcggcg gtcaccccgc gtaccaaggc catcgtcccg 5040 

gtgcacctgt acgggcagtg cgtggacatg acagccctgc gggaactggc cgaccggcgg 5100 

ggcctcaagc tcgtggagga ctgcgcccag gcccacggtg cccggcggga cggtcggctg 5160 

gccgggacga tgagcgacgc ggcggccttc tcgttctacc cgacgaaggt cctcggcgcc 5220 

tacggcgacg gcggcgcggt cgtcaccaac gacgacgaga cagcccgcgc cctgcgacgg 5280 

ctgcggtact acgggatgga ggaggtctac tacgtcaccc ggaccccggg tcacaacagc 5340 

cgcctcgacg aggtgcaggc cgagatcctg cggcgcaaac tgacccggct cgacgcgtac 5400 

gtcgcgggtc ggcgggcggt cgcccagcgg tacgtcgacg ggctcgccga cctccaagac 54 60 

tcgcacggcc tcgaactccc agtggtcacc gacggcaacg aacacgtctt ctacgtgtac 5520 

gtcgtccgcc acccgcgccg cgacgagatc atcaagcgtc tccgggacgg gtacgacatc 5580 

tccctgaaca tcagctaccc ctggccggtg cacaccatga ccggcttcgc ccacctcggt 5640 

gtcgcgtcgg "ggtcgctgcc ggtcaccgaa cggctggccg gcgagatctt ctcccttccc 5700 

atgtacccct ccctccctca cgacctgcag gacagggtga tcgaggcggt gcgggaggtc 57 60 

atcaccgggc tgtgacgagc ccgcgtgtcg tcagcgaaga cccactctgg aagggccggt 5820 

catgccgaac agccactcga ccacgtcgag caccgacgtc gccccgtacg agcgggcgga 5880 

catctaccac gacttctacc acggccgtgg caagggatac cgtgccgaag ccgacgcgct 594 0 

cgtggaggtc gcccgcaagc acaccccaca ggcggcgacc ctgctggacg tggcctgcgg 6000 

gaccggatcc cacctggtcg agctggcgga cagcttccgg gaggtggtgg gggtcgacct 6060 

gtcggccgcc atgctcgcca ccgccgcccg caacgacccc gggcgggaac tgcaccaggg 6120 

cgacatgcgc gacttctccc tcgaccgcag gttcgacgtc gtcacctgca tgttcagctc 6180 

caccggttac ctcgtcgacg aggccgaact ggaccgtgcc gtggcgaacc tggccggtca 624 0 

cctcgcgcct ggcggcaccc tcgtcgtgga gccctggtgg ttcccggaga cgttccggcc 6300 

cggctgggtc ggggccgacc tggtcaccag cggtgaccgg aggatctccc ggatgtcgca 6360 

caccgtcccg gcgggtctgc ccgaccgcac cgcctcccgg atgaccatcc actacacggt 64 20 

ggggtcaccg gaggccggga tcgagcactt caccgaggtg cacgtgatga ccctgttcgc 64 80 

ccgcgccgcc tacgagcagg ccttccagcg ggcgggcctg agctgctcgt acgtcggcca 6540 
cgacctgttc tcgccgggcc ttttcgtcgg ggtcgccgcg gagccggggc ggtgagggtc 6600 
gaggagctgg gcatcgaggg ggtcttcacc ttcaccccgc agacgttcgc cgacgagcgg 6660 
ggggtgttcg gcacggcgta ccaggaggac gtgttcgtgg cggcgctcgg ccgcccgctg 6720 
ttcccggtgg cccaggtcag caccacccgg tcccggcggg gtgtggtccg gggggtgcac 6780 
ttcacgacga tgcccggctc catggcgaag tacgtctact gcgccagggg tagggcgatg 6840 
gacttcgccg tcgacatccg gcccggttcc ccgaccttcg gccgggccga gccggtcgag 6900 
ctctccgccg agtcgatggt cgggctgtac cttcccgtgg gcatgggcca cctgttcgtc 6960 
tccctggagg acgacaccac cctcgtctac ctgatgtccg ccggttacgt ccccgacaag 7 020 
gaacgggcgg tgcaccccct ggatccggag ctggcgttgc cgatcccggc cgacctcgac 7080 
ctcgtcatgt ccgagcggga ccgggtcgca cccaccctcc gggaggcccg ggaccagggg 714 0 
atcctgcccg actacgccgc ctgccgggcc gccgcgcacc gggtggtgcg gacgtgaccc 7200 
cggccgggcg tgcgggccgg tggtggtgct cggcgcgtcg ggtttcctgg gttcggcggt 7260 
cacccacgcc ctggccgacc tcccggtgcg ggtgcggctc gtcgcccggc gggaggtcgt 7320 

cgtgccGtcc ggtgccgtcg ccgactacga gacgcaccgg gtggacctca ccgaacccgg --^7380 
agcgctcgcg gaggtggtcg cggacgcccg ggcggtcttc ccgttcgccg cccagatcag 7 4 40 
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gggtacgtca gggtggcgga tcagcgagga cgacgtggtc gccgaacgga cgaacgtcgg 7500 

cctggtccgg gacctgatcg ccgtcctgtc ccgctcgccg cacgccccgg tggtggtctt 7560 

cccgggcagc aacacgcagg tcggcagggt caccgccggc cgggtcatcg acggcagcga 7620 

gcaggaccac cccgagggcg tctacgacag gcagaaacac accggggaac agctgctcaa 7680 

ggaggccact gcggccgggg cgatccgggc gaccagtctg cggctgcccc cggtgttcgg 7740 

ggtgcccgcc gceggcaccg ccgacgaccg gggggtggtc tccaccatga tccgtcgggc 7800 

cctgaccggc caaccgotga cgatgtggca cgacggcacc gtccggcgtg aactgctgta 78 60 

cgtgaccgac gccgcccggg ccttcgtcac cgccctggac cacgccgacg cgctcgccgg 7920 

acgccacttc ctgttgggga cggggcgttc ctggccgctg ggcgaggtct tccaggcggt 7980 

ctcgcgcagc gtcgcccggc acaccggcga ggacccggtg ccggtggtct cggtgccgcc 8040 

tccggcgcac atggacccgt cggacctgcg cagcgtggag gtcgaccccg cccggttcac 8100 

ggctgtcacc gggtggcggg ccacggtcac gatggcggag gcggtcgacc ggacggtggc 8-160 

ggcgttggcc ccccgccggg ccgccgcccc gtccgagccc tcctgaccgg ggtcacccgg 8220 

gttcgtccta cggcaccggc ccgtcgacgg ccggtgccgg gaagatcgct tcgagttccc 8280 

ggagttcctc ctcgcccagc gtcagctcgg cggcccgtaa cgccgagtcg agctgctcgg 8340 

gtgtgcgggg gccgatgaca gcgcccagga tcccggggcg ggacaggacc caggccagac 84 00 

cgacctcggc cgggtccgcg ccgaggcgtc ggcagtagtc ctcgtacgcc tcgacgaggg 84 60 

ggcgtacggc ggggaggagc acctgggcgc gtccctgcgc cgacttgacg gcggttccgg 8520 

ctgccaactt ctccagtacg ccgctgagca gcccgccgtg caggggggac caggcgaaca 8580 

cgcccacccc gtacgcctgg gcggcgggca ggacgtccag ctcggggtgg cggacggcca 864 0 

ggttgtacag gcactggtgg gagatcatgc cgagcaggtt gcggcgtgcc gcgctctcct 8700 

gggcggcggc gatgtgccag cccgccaggt tggaggagcc gacgtacccg accttcccac 87 60 

tgccgaccag atgttcggcg gcctgccaca cctcgtccca cggtgcggcg cggtcgatgt 8820 

ggtgcgtctg gtagatgtcg atgtggtcga ccccgaggcg gcggagggag ttctcgcagg 8880 

cggcgacgat gtgtcgggcg gagagcccgc cgtcgttgac ccgttcgctc atctcgctgc 8940 

ccaccttggt cgccaggacg gtctcctcgc gtcgacctcc gccctgggcg aaccaccgtc 9000 

cgacgagttc ctcggtgtgg cccttgtaga gccgccagcc gtagatgtcg gcggtgtcga 9060 

tgcagttgac gccccgctcg agggcgtggt ccatcagccg cagcgcgtcg tcgtcggtca 9120 

cccgtccact gaagttcacg gtgccgagcc agagtcggct ggtgtgcaac gccgatcgtc 9180 

cgacgcgtac ccgggcggac ccggccccgg tggttcccac gtcggtcacc tgtcggcgcg 924 0 

gtgctggtgg gcgagcgcct ccagcacggg tacgacctcg gcgggggtcg gcgcggccag 9300 

cgcctcctgc cgcagcttct cggcgttctc ggcgtgggaa cggtcctcga ccactgtggc 9360 

gagagcctgc cagagggtgt cggcgtcgac ctcgtccgga cggaggaaga cacccgctcc 94 20 

cagctcggcg gtgcgctgac cacgcaggac acagtcccac tcgtgggcga cggagatctg 94 80 

cggtacgccg tggtgcagcg cggtggccca gcttccggca ccgccgtggt ggatgacggc 9540 

ggcacagccc ggcagcagga tgttcatggg aacgaagtcc accaggcgga cgttgtccgg 9600 

caccgacgcc ggatcgagcc cggagcgggt caccacgatc tcgccgtcga accgcgcgag 9660 

ggtggccagt gtccggagga actcctgcgg gttcgaggtg atgcccagcg ccgagtatcc 9720 

cccggtgaag cagacccggc ggactccgtc cgaggtcctg agccactgcg gcacgacgga 97 80 

ggacccgttg tagggcaaag tccgggtgtg caccgactcc agtccggtct ccaggcggaa 984 0 

gctctcgggc agctggtcga cgctccactg tccgacagcg aggtcctcgc tgtagtcgag 9900 

gccgaaccgg ccggcgacct cggtgagcca gccgccgagc gggtccggcc ggtcgtcggc 9960 

gggacgctgc ccgcgcaggt cctgggagcg gctgcggaag tagccggtga ggtcgctgcc 10020 

ccacagcagc cgggcgtggg cggccccgca ggccttggcc gcgaccgccc cggcgaaggt 10080 

gaagggctcc cagagcacca ggtcgggacg ccagtccatg gcgaactcga cgagttcgtc 1014 0 

gacgaaggag tcgttgttga ccaccgggaa gacgaaccgg gaggtggcct cctcgatgcc 10200 

gtgcaggaac tcccacgagc gcagttccgg tccgcgtcgg gcgaagtcca ggtcggtggt 10260 

gtagcggtgc acctgcgcgg cggcctcagg ggagatgtcg aagagtcggt ggtccgagcc 10320 

gagtggcacc gaggtcagtc ccgcgccgac gacgacgtcg gtgagctcgg gctgactggc 10380 

cacccggacg tcgtggccgg cggtgtgcag cgcccaggcc agggggacga ggccctggaa 104 40 

gtgggtacgg tgcgcgaacg aggtgagcag gacccgcact ggtcactcct tggtcgagat 10500 

gagggcggca acggtccggt cgatgccctc ggccagcggc acccgggggt gccagccggt 10560 

cagcgtccgg aactcggtgg agtcgaagtc gtcgctgcgg aagtcgttgg cctcggcgtt 10620 

ctccggtgga gggacgctga cgacgggcac cgcagggttg ccggtctgac gtgccacgct 10680 

ggcggcgacg gtctcgaaga tctcgccgag gggtcgggcc tcgtccgcgc tcggcgtcca 10740 

gacgtcgccg accagcgcct cgtggttgtg cagtgcggcg gtgaacgcgg tggccacgtc 10800 

ctcgacgtgc aggaggttgc ggcgcacgct gccctcgtgc cacatcgtga tcggctcacc 10860 

ggcgagggct cgccggatca tggcggtgac gacaccccgg ccggtctgcc ccgacgggcc 10920 

gctgtggccg tagatcgcgg gcaggcgcag gatcaccccg tcgacgaccc cgtcctcggt 10980 

ggcctgacgc aggatccgct cggcctcgat cttgtgctgg gcgtaccggc tgggggcggc li040- 

ggggttcgcg gcctgggtgg tgctggcgaa caggagcacc ggcgcgggtc cgggtcttgc 11100 
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ccgcagcgcg 
cgtggcggcg 
gtcggtgtcg 
ctcgatcccg 
tcggagaggg 
gagaagtgtg 
tccatttgtt 
cctctttcga 
tgtcggggaa 
gatccgggtc 
ttttcggcga 
ggccgcaccg 
ccgtgccgac 
ccggcggcca 
tgatcgacac 
tcgcgctttc 
agcgaagagt 
tttcttcagc 
gtggttgacg 
ccgtggcccc 
gcatatctcg 
gagacacgcg 
ctgaccgccg 
gtcgcccgac 
ggcatggcga 
gagcgggcct 
cacctgcgcc 
gccctgtggc 
ctggccgccg 
ctgtggagcc 
tccccggccg 
gtcaacggtc 
gccgagctgg 
tcggcgcagg 
ggcgact ccg 
ctcggcgccg 
cgtgcggtcc 
gcggcctccc 
ctgcaacgcg 
ggtggcgtga 
tcggccgtcg 
gaccacgtac 
gggcgggagg 
gtgcagctgc 
t acgaccacc 
ccggggcggg 
gtggtcgcca 
ctgctggccg 
gactcgctgt 
ttcctcaccg 
ctggccgtcg 
"gccgggatcc 
ccccaggagt 
accgggacca 
ccggcgatca 
cagtcgctgc 
acaccgggca 
aaggcgttct 
ctggaacggc 
accgctg-tca 
gtccgggtga 



gcgacgaggt cgcgcatgat gcccgcgttg 
ctgcgccagg tcgacccgcc ggcggcgtag 
gcgacgacct gcgcgacccg gccgggttcg 
gcgctgcctg gtggctggtc gcgagacccg 
tgtgtggtaa attcgcgaag aagggcgctt 
acatgtcttg tcatctacta atgcattccg 
ccccccaggg tggtgtcggg tgacaaatcc 
gcgggtgctg aggcttcccg cgtaccctcg 
agggcggatc gaggagttcg gtagggcgtc 
gacgccccga cgcgtgacag ggcgtcgatc 
tggtcgcaga ttcctcccga cgtggtggac 
tcggtggcct cgtcgggggt gtcggagacc 
cagggtcggt ccgtcgccga ggtgggtcac 
ccgcccgatc gtgcccacct tcgcctccgc 
ttccggcgac gctatcaccg gagcattccc 
caaacaggga aaacagcagc tcacagcggt 
ctcgatgggg tcaaggtgaa ttctgtcaca 
caccctcgac gttcatacaa ttggccggca 
tgcccgatct actcggcacc cggactccgc 
tgtgcggtca caacgaaccg gagctgcggg 
aaggcatttc cgaggatgac gtggtggccg 
cgcaggacgg gccgcaccgc gccgtcgtcg 
cgctcgccgc cctcgcccag ggccgcccac 
ccacggcacc ggtggtgttc gtcctgcccg 
cccgactgct cgccgagtcg cccgtcttcg 
tcgacgaggt caccgactgg tcgttgaccg 
gcgtcgaggt ggtccagccc gcgctcttcg 
ggtcgttcgg ggtgcgaccc gacgccgtac 
ccgaggtctg cggcgccgtc gacgtcgagg 
gcgagatggt cccactggtg ggccggggtg 
agctggcagc ccgggtcgag cggtgggacg 
'cccggtcggt gctgctcacc ggcgctcccg 
cggcacaggg cgtacgcgcc caggtcgtca 
tcgacgccgt cgccgagggc atgcgctcgg 
acgtgcccta ctacgccggc ctcaccggcg 
accactggcc gcgcagtttc cggctcccgg 
tggaactgca gcccggcacg ttcatcgagt 
tgcagcagac cctcgacgag gtcgggtccc 
accagggcgg tctgcggcgg ttcctgctcg 
cagtcgactg gaccgccgcc taccccgggg 
ccgtcgagac cgacgaggga ccctcgacgg 
tgcgcgcgcg gctgctggag atcgtcggcg 
tcgacgcccg ggccaccttc cgggaactgg 
ggacccgcct cgccacggcg accgggcggg 
cgaccccgca cgccctcacc gaggcgctgc 
gtgaggagac ggcacacccg acggaggccg 
tggcgtgccg gctgcccggc ggcgtcacct 
aggggcggga cgccgtcggc gggctgccca 
tccacccgga cccgacccgg tcgggcacgg 
gcgccacctc cttcgacgct gccttcttcg 
agccgcagca gcggatcacg ttggagctgt 
ccccgacgtc gttgcggacc tcccggaccg 
acggcccccg gctggccgag gggggtgagg 
ccaccagcgt cgcctccggt cgggtcgcct 
gcgtcgacac cgcctgctcg tcgtcgctcg 
ggcgcggcga gtcgacgatg gcgctcgccg 
tgctcgtgga cttcagtcgg atgaactccc 
cggccgccgc cgacgggttc ggcatggccg 



acgcgttcgg 
gcgaccagat 
agcaggtcga 
gtgcgcgcga 
ccgacgaatc 
atagccacog 
ggcctcaggt 
gtggcctgcg 
gcggcgcgta 
cgtgccgccc 
tcattggttc 
gggtcgatcg 
cgtcgggtgg 
gggtaaatgc 
cggcaccacc 
tccaggcgcc 
gatgtttttg 
tctctaccaa 
acccagggcc 
cccgcgcccg 
tcggcgccgc 
tggcctcctc 
acccctcggt 
gtcagggcgc 
ccgcggcgat 
aggtcctgga 
cggtgcagac 
tcggacacag 
ccgccgcgcg 
acatggcggc 
acgacgtcgt 
agcccatcgc 
acgtgtcgat 
cgctgacctg 
ggcggctgga 
tgcgcttcga 
cgagcccgca 
cggccgcgat 
ccgtggcgca 
tgacccccgg 
agttcgactg 
ccgagacggc 
gcctcgactc 
atctgcacat 
tgcgcggccc 
aacccgacga 
caccggagga 
ccgaccgggg 
* cgcaccagcg 
ggctgtcgcc 
cgtgggaggt 
gggtgttcgt 
gcgtcgaggg 
acaccctcgg 
tcgccgtgca 
gtggcgtgac 
tcgcccccga 
aaggcgcagg 
tgctcgccgt 
ccccgaacgg 
cgccccacac 



cctcgggcac 

gcacgacgac 

ctcgaaggtg 

cggcccgcag 

cagaaacgcc 

gcgcatggaa 

cggcctcaag 

ttcgggcggg 

ctccgggact 

gtaccgccgg 

tcccgggtgt 

ccgtccccgg 

acccggtccg 

ttcgtcgatc 

ggtcgatgcc 

gggcaatcct 

ttaaatgtac 

gggggagtga 

gctcccattc 

tcaattgcac 

cctcgcgcgc 

ggtcaccgag 

ggtacgcggt 

ccagtggccc 

gcgggcctgc 

ctcacccgag 

ctcactggcc 

catcggtgag 

ggccgccgcc 

ggtggcgctc 

gccggccggg 

acggcgggtc 

ggcggcgcac 

gttcgccccc 

cacccgggaa 

cgaggcgacc 

cccggtgctg 

cgtgccgacc 

ggcgtacacc 

ccacctgccg 

ggccgcgccc 

cgcgctcgcc 

ggtcctcgcg 

cgccatgctc 

gcaggaggag 

acccgtcgcc 

gttctgggag 

atgggacctg 

cgctggtggc 

acgggaggca 

gctggaacgc 

cggtctgatc 

ctacctgatg 

cctggagggg 

cctggcgtgc 

ggtgatgccg 

cggacggtcc 

gatgctcctg 

gatcaggggc 

ccgggcccag 

cgtcgacgtc 
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gtggagaccc acggcaccgg cacccgcctc ggtgatccga tcgaggcacg ggcgctctcc 
gacgcgtacg gcggtgaccg tgagcacccg ctgcggatcg gctcggtcaa gtccaacatc 
gggcacaccc aggccgccgc cggtgtcgcc ggtctgatca aactggtgtt ggcgatgcag 
gccggtgtcc tgccccgcac cctgcacgcc gacgagccgt caccggagat cgactggtcc 
tcgggcgcga tcagcctgct ccaggagccc gctgcctggc ccgccggcga gcggccccgc 
cgggccgggg tgtcctcgtt cggcatcagc ggcaccaacg cacacgcgat catcgaggag 
gcgccgccga ccggtgacga cacccgaccc gaccggatgg gcccggtggt gccctgggtg 
ctctcggcga gcaccggcga ggcgttgcgc gcccgggcgg cgcggctggc cgggcaccta 
cgcgagcacc ccgaccagga cctggacgac gtcgcctact cgctggccac cggtcgggcc 
gcgctggcgt accgtagtgg gttcgtgccc gccgacgcgt ccacggcgct gcggatcctc 
gacgaactcg ccgccggtgg atccggggac gcggtgaccg gcaccgcccg cgccccgcag 
cgcgtcgtct tcgtcttccc cggccaggga tggcagtggg cggggatggc agtcgacctg 
ctcgacggcg acccggtctt cgcctcggtg ctgcgggagt gcgccgacgc gttggaaccg 
tacctggact tcgagatcgt cccgttcctg cgggccgagg cgcagcgccg gacccccgac 
cacacgctct ccaccgaccg cgtcgacgtg gtccagccgg tgctgttcgc ggtgatggtg 
tccctggcgg cccggtggcg ggcgtacggg gtggaaccgg cggccgtcat cggacactcc 
cagggggaga ttgccgcggc gtgtgtggcc ggggcgctct cgctggacga cgcggcccgg 
gcggtggccc tgcgcagccg ggtcatcgcc accatgcccg gcaacggcgc gatggcctcg 
atcgccgcct ccgtcgacga ggtggcggcc cggatcgacg ggcgggtcga gatcgccgcc 
gtcaacggtc cgcgcgcggt ggtggtctcc . ggcgaccgtg acgacctgga ccgcctggtc 
gcctcctgca ccgtcgaggg ggtgcgggcc aagcggctgc cggtggacta cgcgtcgcac 
tcctcgcacg tcgaggccgt ccgtgacgcg ctccacgccg aactcggcga gttccggccg 
ctgccgggct tcgtgccgtt ctactcgaca gtcaccggcc gctgggtcga gcccgccgaa 
ctcgacgccg ggtactggtt tcgcaacctg cgccacaggg tccggttcgc cgacgcggtc 
cgctccctcg ccgaccaggg gtacacgacg ttcctggagg tcagcgccca cccggtgctc 
accacggcga tcgaggagat cggtgaggac cgtggcggtg acctcgtcgc tgtccactcg 
ctgcgacgtg gggccggcgg tcccgtcgac ttcggctccg cgctggcccg cgccttcgtg 
gccggcgtcg cagtggactg ggagtcggcg taccagggtg ccggggcgcg tcgggtgccg 
ctgcccacgt acccgttcca gcgtgagcgc ttctggttgg aaccgaatcc ggcccgcagg 
gtcgccgact ccgacgacgt ctcgtccctg cggtaccgca tcgaatggca cccgaccgat 
ccgggtgagc cgggacggct cgacggcacc tggctgctgg cgacgtaccc cggtcgggcc 
gacgaccggg tcgaggcggc gcggcaggcg ctggagtccg ccggggcgcg ggtcgaggac 
ctggtggtgg agccccggac gggccgggtc gacctggtgc ggcggctcga cgccgtgggt 
ccggtggcgg gcgtgctctg cctgttcgct gtcgcggagc cggcggccga acactccccg 
rr.ggcggtga cgtcgttgtc ggacacgctc gacctgaccc aggcggtggc cgggtcgggc 
ciqgagtgtc cgatctgggt ggtcaccgag aacgccgtcg ccgtcgggcc cttcgaacgg 
ctccgcgacc cggcccacgg cgcgctctgg gccctcggtc gggtcgtcgc cctggagaac 
cccgccgtct ggggcggcct ggtcgacgtg ccgtcgggtt cggtcgccga gctgtcgcgt 
cacctcggga cgaccctgtc cggcgccggc gaggaccagg tcgccctccg acccgacggg 
acgtacgccc gccggtggtg cagggcgggc gcgggcggca cgggccggtg gcagccccgg 
ggcacggtgc tcgtcaccgg cggcaccggc ggggtcggtc ggcacgtcgc ccggtggctg 
qrccgccagg gcaccccgtg cctggtgctg gccagccgcc ggggaccgga cgccgacggg 
gtcgaggagc tactcaccga actcgccgac ctgggcaccc gggccaccgt caccgcctgc 
gdcgtcaccg accgggagca gctccgtgcc ctcctcgcga ccgtcgacga cgagcacccg 
ctgtcggcgg tgttccacgt cgccgcgacg ctcgacgacg gcaccgtcga gaccctcacc 
ggtgaccgca tcgaacgggc caaccgggcg aaggtgctcg gtgcccgcaa cctgcacgag 
ctgacccggg acgccgacct cgacgcgttc gtgctcttct cctcctccac cgccgcgttc 
ggcgcgccgg ggctcggcgg ctacgtcccg ggcaacgcct acctcgacgg tctcgcccag 
cagcgacgca gcgagggact cccggccacc tcggtggcgt ggggtacctg ggcgggcagc 
gggatggccg agggtccggt cgccgaccgg ttccgccggc acggggtcat ggagatgcac 
cccgaccagg ccgtcgaggg tctccgggtg gcactggtgc agggtgaggt agccccgatc 
gtcgtcgaca tcaggtggga ccggttcctc ctcgcgtaca ccgcgcagcg ccccacccgg 
ctcttcgaca ccctcgacga ggcccgtcgg gccgcgcccg gtcccgacgc cgggccgggg 
gtggcggcgc tggccgggct gcccgtcggg gaacgcgaga aggcggtcct cgacctggta 
cggacgcacg cggctgccgt cctcggccac gcctcggccg agcaggtgcc cgtcgacagg 
gccttcgccg aactcggcgt cgactcgctg tcggccctgg aactgcgcaa ccggctgacc 
actgcgaccg gggtccggct ggccacgacg acggtcttcg accacccgga cgtacggacc 
ctggccggac acctggccgc cgaactgggc ggcggatcgg ggcgggagcg gcccgggggc 
gaggccccga cggtggcccc gaccgacgag ccgatcgcca tcgtcgggat ggcctgccgg 
ctgccggggg gagtggactc accggagcag ctgtgggagt tgatcgtctc cgggcgggac 
accgcctcgg cggcacccgg ggaccggagc tgggatccgg cggagttgat ggtctccgac 
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acgacgggca 

gcgttcttcg 

ctggagacca 

acggacaccg 

cccgaggacg 

cggatcgcgt 

tcgtcgcttg 

gcggtggcgg 

cagggcgcgt 

ggtctggggg 

gggcgtcggg 

gggttggcgg 

gcgggtgtgt 

ggggatccgg 

ggtccggtgg 

gtggtgggtg 

tgtcggggtg 

ggggtgcggg 

ggggtgtcgg 

gcggaacggc 

ccggtggtgc 

gaccacctgg 

gcccgccaac 

gaacggctgc 

tcgggtggtg 

cgggggttgt 

tcgtcggtgg 

ttggatcggg 

ttgtggcggt 

gcggcggcgg 

cgggcgcggg 

cgcgacgacg 

gcggtcaacg 

gtcgagcact 

cactccgcac 

ggccgcccgg 

gaactggacg 

gtcgaggcgc 

ctgtcgatgg 

ctggaacgcg 

cacggcgtac 

acctatccct 

gtcgccgact 

ctcgacggtc 

gaggtgcggg 

gtcaccgacc 

ggtgcggccg 

ctgtgggtgg 

gcgacggtgt 

ctgctggatc 

gccggtgccg 

cccaccccgg 

gggggcaccg 

cacctcgccc 

gacctgaccg 

tcggtcggcg 

cacgctgccg 

gacgtggtgg 

gaactgttcc 

tacgccgccg 

cccgccacct 



cccgtaccgc 

ggatctcgcc 

cctgggaggc 

gtgtcttcgt 

aggtcgacgg 

acgtgttggg 

tggcgttgca 

gtggggtgtc 

tggctccgga 

aggggtcggc 

tgttgggtgt 

cgccgtcggg 

cgggtgggga 

tggagttggg 

tggtgggttc 

tgatcaaggt 

ggttgtcggg 

ggtggccggt 

ggacgaatgc 

cggtggaggg 

tgtcggcaaa 

agacgcaccc 

gcttcgacag 

gcggcctcgc 

gtgtggtgtt 

tgtcggttcc 

tggggttttc 

tggatgtggt 

ggtgtggggt 

tggtggcggg 

cgttgcgggc 

'tacagaagct 

gccccgacgc 

gtgacgggat 

aggtcgagtc 

cgacggtgcc 

ccgactactg 

tggcagcgcg 

cggtcgggga 

acaccgacga 

ccgtggactg 

tccagggacg 

ggttccaccg 

gctggctggt 

ccgccctcgc 

gggtcggtga 

agaccctggc 

tcaccgtggg 

gggggttggc 

tgccgcagac 

aggaccaggt 

tcaccggagc 

ccggtctggg 

tggtcagccg 

ggctcggcgt 

ccctggtgca 

gtctgcccca 

ccgtgaaggt 

tgctgttctc 

gaaacgcctt 

cggtggcgtg 



cttcggcaac 
gcgtgaggcg 
gctggagaac 
gggcatgtcc 
ctacctgttg 
gttggagggg 
cgtggcggcg 
ggtgatggcc 
cggcaggtgc 
cttcgtcgtg 
ggtggtgggt 
ggtggcgcag 
tgtgggtgtg 
ggcgttgttg 
ggtgaaggcg 
ggtgttgggg 
gttggtggat 
gggtgtggat 
tcatgtggtg 
gtcgtcgcgg 
gaccgaaacc 
cgacgtcccg 
gcgcgcggtc 
cgggggcgaa 
tgtttttcct 
ggtgtttgtg 
ggtgttgggg 
gcagccggtg 
tgtgcctgcg 
ggtgttgtcg 
gttggccggc 
cctcgacagc 
ggtggtggtc 
cggggtccgg 
gctccgggag 
gttctactcc 
gtaccgcaac 
tgacctcacc 
gacgcttgcc 
cgtcgagcgc 
ggcggcggtc 
gcggttctgg 
ggtcgactgg 
ggtcgtaccc 
cgccggtggt 
cagcgacgcg 
gctgctgcga 
ggccgtcgcc 
ccttgtcgcc 
accggacccg 
agcggtccgc 
c.gggccgtac 
* tgccgtcacc 
gcgcgggccg 
acgggtgtcg 
ggagttgaca 
gcaggtgcca 
cgacggcgcg 
ctccggggcc 
cctggacgcc 
ggggctctgg 



ttcatgcccg 

ttggcgatgg 

gccggtatcc 

catcaggggt 

acaggcaaca 

ccggcgatca 

ggttcgttgc 

ggtccggagg 

aagcccttct 

ttgcagcggt 

tcggcggtga 

cagcgggtga 

gtggaggcgc 

gggacgtatg 

aatgtgggtc 

ttgggtcggg 

tggtcgtcgg 

ggggtgcgtc 

gtggcggagg 

gggttggtgg 

gccctgcacg 

atgaccgacg 

ctcctcgccg 

ccggggaccg 

ggtcagggtg 

gagtcggtgg 

gtgttggagg 

ttgttcgtgg 

gcggtggtgg 

gtgggtgatg 

cacggcggca 

ggcccctgga 

tccggcgacc 

gcccggacga 

gagctgctct 

accctcaccg 

ctgcgccacc 

acgttcgtcg 

gacgtggagt 

ttcctcacct 

ctcggctccg 

ctgcaccccg 

acggcgacgg 

gaggggtaca 

gccgagccgg 

gtggtgtcga 

cgactcgacg 

cccgccggtc 

tccctggaac 

cagctacgac 

gccgacgccg 

accgccccgg 

gcccgatggc 

ggcaccgccg 

gtgcactcct 

gcagccggtg 

ctgaccgaca 

gtgcacctgg 

ggggtgtggg 

ttcgcccgac 

gcggccgggg 



gggcgggcga 
atccgcagca 
ggcccgagtc 
acgccaccgg 
ccgcgagcgt 
ctgtggacac 
gttctgggga 
tgttcaggga 
cggacgaggc 
tgtcggtggc 
atcaggatgg 
ttcggcgggc 
atgggacggg 
gggtgggtcg 
atgtgcaggc 
ggttggtggg 
gtgggttggt 
ggggtggggt 
cgccggggtc 
gggtggttgg 
cccaggcacg 
tggtgtggac 
ccgaccggac 
gtgtggtgtc 
gtcagtgggt 
tggagtgtga 
gtcggtcggg 
tgatggtgtc 
gtcattcgca 
gtgcgcgggt 
tggcctcggt 
cggggaagct 
cccgagccgt 
tccccgtcga 
ccgtcctggc 
gtgggttcgt 
cggtgcggtt 
aggtcagccc 
•ccgccgtcac 
ccctcgccga 
gaaccctggt 
accgtggtcc 
ccaccgacgg 
cggacgacgg 
tggtgacgac 
tgctcgggct 
cacaggcgtc 
cggtgcagcg 
gcggacaccg 
cccggctggt 
tacacgcccg 
gcgggacgat 
tcgccgagcg 
gcgtcgacga 
gcgacgtcgg 
acgtggtccg 
tggacccggc 
ccgacctgtg 
gcagtgcccg 
accggcggga 
ggatgacagg 



gttcgacgcg 

gcggcacgcc 

gttgcgcggt 

ccgcccgaag 

cgcctccggt 

ggcgtgttcg 

ctgtggtctg 

gttctcccgg 

cgacggcttc 

ggtgcgggag 

ggcgagtaat 

gtggggtcgt 

gacgcggttg 

gggtggggtg 

ggcggcgggt 

tccgatggtg 

ggtggcggat 

gtcggcgttt 

ggtggtgggg 

tggtgtggtg 

tcgactcgcc 

gctgacgcag 

ccaggccgtg 

gggggtggcg 

ggggatggcg 

tgcggtggtg 

tgcgccgtcg 

gttggcgcgg 

gggggagatc 

ggtggcgttg 

acgccgaggc 

ggagatcgcc 

gaccgagctg 

ctacgcctcc 

cgggatcgag 

cgacggcacc 

ccacgccgcc 

gcaccccgtg 

tgtgggcacc 

ggcgcacgtc 

cgacctgccc 

gcgtgacgat 

gtcggcccga 

ctgggtcgtg 

ggtcgaggag 

ggccgacgac 

caccacccca 

ccccgaacag 

gtggaccggc 

cgaggcgctc 

tcggatcgtc 

cctcgtcacc 

cggtgccgaa 

ggtggtccgg 

cgaccgcgag 

gggggtggtc 

cgacctcgcc 

cccggaggcc 

tcagggtgcg 

ccggggtctg 

ggaccaggag 



18480 
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20280 

20340 

20400 
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20520 

20580 

20640 
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20820 
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gcggtgtcgt 

gcgctggaac 

gcggccttcg 

acacctgcgg 

ctggcggccc 

gccgcagccg 

ctcgggttcg 

ctgcgtctgc 

ctccacgacc • 

ctggccgcgc 

gagcgcctgg 

ccgaccgccg 

cgggaactcg 

ggacctgtga 

cgggccgccc 

gcctgccgcc 

gggcacgaga 

cacccggacc 

gtggcgggct 

ccgcaacagc 

ccgcactccc 

ggcgagaacg 

gctgtcgcct 

gacaccgcgt 

ggcgagtcga 

gtcgacttca 

gccgccgacg 

gaggccgaaa 

gacggggcca 

caggcgctac 

accggcacca 

gaccgggatc 

caggcggcgg 

ctgcccgcca 

gtacgcctgg 

gtgtcggcgt 

cggaccaccg 

cggtcggcgg 

gacgtcgggc 

cgggcggcgg 

gcggtcgaac 

gtcgtcttcc 

gactcggcac 

caggactggt 

gtcgacgtgg 

tcgtacgggg 

cacgtggcgg 

ttgctgcggt 

gtacgccgcc 

cggtcggtgg 

gccgagggcg 

gacagggtcc 

atcaccttct 

tactggtacc 

gactcgggat 

gccgaggcgg 

ggcgacggcg 

gacgtcgact 

ttccaacgga 

gcctaccggg 

tggctggtgg 



tcctgcgtga 

gggtcctcac 

ccgagtcgta 

cggcggtcgg 

tgccccgggc 

tgctcggcag 

actcgctggc 

cggccaccct 

gactcggcga 

tggagcaggc 

aacggatgct 

gtgacgacct 

acgccaggtg 

ctgacaacga 

gcaagcgcct 

taccgggcgg 

cggtgtccac 

ccgaccaccc 

tcgacgccga 

ggctgctgtt 

tgcgtggcac 

gcaccgaagc 

ccgggcggat 

gctcgtcgtc 

gtctcgctgt 

gccgccagcg 

ggttcggctt 

gcaacggcca 

gcaacggtct 

gaaactgcgg 

cgctcggcga 

cggaccaccc 

cgggcgtcac 

ccctgcacgt 

cgacccgggg 

tcggcatcag 

agcgcaccgt 

cggcgctacg 

tggcggaggt 

tggtggcgtc 

cgcgcggcga 

tcttcccggg 

cggcgttcgc 

cggtctccga 

tgcagccggt 

tcacccccgc 

gtgcgctctc 

cgctgtccgg 

gactgcggtc 

tggtggccgg 

tacgggtccg 

gtgacgaact 

actcgacggt 

gcaacctgcg 

acgacgcgtt 

tcgaggaggc 

gaccgggggc 

ggacgcccgc 

agccgtactg 

tgtcctggac 

tgcaccccgg 



gcggggcgta 

cgccggggag 

cacctccgcc 

cgagcgcgac 

cgagcggtcg 

cgacgcgaag 

cgcggtccgg 

ggtcttcgag 

ggccggcgag 

cctgcccgac 

cgccgggctc 

gggggaggcc 

aacccgaact 

caaggtggcg 

gcgcgagctg 

ggtgcacctc 

cttccccacc 

cggcaccagc 

gttcttcggg 

ggagaccagt 

cccgaccggc 

cggtgacgcc 

ctcctacgcc 

gttggtggcg 

cgtcggcggg 

ggcgttggcc 

ctccgagggg 

cgaggtgttg 

cgccgcgccg 

cctgaccccg 

cccgatcgag 

gctgtggctg 

cgggctgctc 

cgacgagccc 

ccggccgtgg 

cgggaccaac 

cggcggcgac 

ggcccaggcg 

cgggcggagc 

gacccgggcc 

ggacaccgtc 

acaggggtcc 

cgacacgatc 

cgtgctccgg 

gctgttcgcg 

tgcggtggtg 

cctcgccgac 

gggcggcggc 

gtgggaggac 

ggaaccggag 

cgagatcgac 

cctgacggtc 

cgacgtccgt 

ggagacggtc 

cgtcgaggtc 

aggtgtcgag 

gttcctgcgg 

cctcccggga 

gctgcggtcg 

gccgatcacc 

gggcagcacc 



cggccgatgt 

accgcggtgg 

cggccccggc 

gagccgcgtg 

gcggagctgg 

gccgtacccg 

ttccgtaacc 

cacccgaacg 

ccgacccccg 

gcctccgaca 

cgccccgagg 

ggcgtcgacg 

gaccgcagcc 

gagtacctcc 

caatccgacc 

ccgcagcacc 

gggcgcggct 

tacgtcgacc 

atctccccgc 

tgggagctgg 

gtcttcctcg 

gagggctatt 

ctcgggctgg 

ctgcacctgg 

gcggcggtca 

gctgacggca 

gtctccctcg 

gctgtcatcc 

aacgggaccg 

gccgacgtgg 

gccaacgccc 

gggtcggtga 

aagatggtgc 

accccgcacg 

cggcggggtg 

gcccacgtga 

gtcggcccgg 

gcccaggtcg 

ctggccgtga 

gaggcggtgc 

accggggtcg 

cagtgggtcg 

cgcgcctgcg 

caggagccgg 

gtgatggtgt 

gggcactcgc 

gcggcgaggc 

atgagcgccg 

cggatctccg 

gcgctgcggg 

gtcgactacg 

acgggggaga 

gctgtcgacg 

cggttcgccg 

agcccgcatc 

gacgccgtcg 

tcggcggcca 

gctgcgacga 

tctgctcccg 

ccgcccgggg 

ggatgggtcg 



cggtgccgag 

tcgtcgccga 

cgctgctcca 

agcagaccct 

tacgcctggt 

ccaccacgcc 

ggctggccgc 

ccgcagccgt 

tccggtcggt 

cggagcgggt 

ccggagccgg 

aactcctcga 

gcagccgaag 

gtcgtgcgac 

cgatcgcggt 

tgtgggacct 

gggacctggc 

ggggtgggtt 

gcgaggccac 

tggagagcgc 

gcgtggcgcg 

cggtgaccgg 

agggtccgtc 

cggtcgagtc 

tggcgacacc 

ggtcgaaggc 

tcctgctcga 

gtggctccgc 

cccagcgcaa 

acgccgtgga 

tgctggacac 

agtcgaacat 

tggcactgcg 

tggactggtc 

accggccgag 

tcgtcgagga 

tcccgctcgt 

ccgagctggt 

cccgggcgcg 

gggggctgcg 

ccgagacgtc 
ggatgggcgc 
acgaggcgat 
gggcaccggg 
cgttggcgcg 
agggggagat 
.tggtggtggg 
tcgcgctcgg 
tggccgccgt 
agtggggacg 
cctcgcactc 
tcgagccccg 
gcaccgacct 
acgcgatgac 
cggtggtggt 
tcgtcggcac 
ccgcccactg 
tcccgttgcc 
cccccgcctc 
acggcgtact 
acgggttggc 



ggcactggaa 
cgtcgactgg 
ccggctcgtc 
ccgggaccgg 
ccggcgggac 
gttcaaggac 
ccacaccggt 
cgccgacctc 
gggcgccgga 
cgagctggtc 
ggccgacgcc 
cgcgctcgaa 
cagagaccga 
gctcgacctg 
cgtcggcatg 
cctgcgccag 
cgggctcttc 
cctcgacgac 
ggccatggac 
cggcatcgat 
gctcggctac 
ggtggcaccc 
gatcagcgtg 
gctgcggctg 
aggggtgttc 
cttcggggcc 
acggctctcc 
cctcaaccag 
ggtgatccgg 
ggcgcacggc 
ctacggccgt 
cggccacacg 
ccacgaggaa 
ctcgggagcg 
gcgggccggg 
ggcacccgag 
ggtgtccgcc 
ggagggctcc 
acacgagcac 
cgaggtcgcg 
cgggcgcacc 
ggagctgctg 
ggcaccgttg 
actggaccgg 
gttgtggcag 
cgccgccgcc 
ccgcagccgg 
tgaggccgag 
caacggaccc 
ggagcgggag 
gccgcagatc 
gtcggcggag 
ggacgcgggg 
ccggttggcc 
gtcggcggtc 
" cctgtcccgg 
cgccggtgtg 
gacgtacccg 
ccacgatctc 
cgacggcgac 
ggcggcgatc 
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accgccggcg 

ctggccgagg 

accgacgaac 

ggtgfacgccg 

gtcgacggtg 

cggctggagc 

gccgggacgc 

cgtggcgacc 

gggttcaccc 

ctggcccggt 

ggcgaggagt 

gaggcggagg 

gagacgttga 

gtcgcggcga 

gaacgggagg 

tacgccgccg 

gccagcgcct 

ctgcgcgagc 

ctgctccgcg 

gagggtttcg 

gaccccgacg 

atcgcggcgc 

gtcgcggagg 

gaactcggcc 

ggcctgcgga 

tacctgcgtc 

accgacgagg 

gccacccccg 

cccaccgacc 

accagctacg 

ttcgggatca 

atcgcgtggg 

accggcgtct 

gaccggctca 

gcctacacct 

ctggtcgcca 

gccggcgggg 

gggctcgccg 

gccgagggcg 

caggtgctgg 

gccgccccga 

ctgcgtcccg 

ccgatcgagg 

ctgggctcgg 

atcaaggcgg 

ttgtccccgc 

tggccccccg 

aacgcccacg 

gccccgggcg 

caggcgcgga 

gcccgtaccc 

gaccgggagg 

gtcgtcgccc 

tcgcagtggg 

atgggccggt 

cgtggggtcg 

gcggtgatgg 

gtgggtcact 

gacgccgcca 

ggcatggtgt 

gggcgggtcg 



gtggccgggt 
cgctcgcccg 
ggcacgtcga 
gaatcgacgc 
acctggcccg 
tggcccgccg 
gtctggtcgc 
gtctctacgg 
cgcacggcac 
ggctcgccga 
tgctgaccgc 
cactgcgtac 
cgaacttcgc 
agaccgcgct 
tctactgctc 
gcagcgccta 
cggtggcctg 
gcggcctgcg 
ccggtgcggt 
cggccatccg 
gcgcgcccgt 
tgtccccgca 
tgctgggaca 
tcgactcgct 
tgccggcctc 
gactggtcgt 
ccgaacccgt 
aggacctctg 
ggggctggga 
tcgacagggg 
ccccccqcqa 
-aggcggtgga 
tcgt cggcat 
acggctacca 
tcgggtggga 
tccacctcgc 
tgacggtcat 
ccgacgggcg 
tcgcggcgct 
cggtgctgcg 
acgggccgtc 
ccgacgtcga 
ccggggcgct 
tgaagacgaa 
tcctggcgat 
acatcgactg 
gtgagcgccc 
tcatcgtcga 
ggcccctgcc 
ccctcgccga 
tggccaccgg 
gtgtctgcgc 
cggcggtctt 
tcggcatggc 
gcgccgaggc 
gcgaccccga 
tgtcgctggc 
cgcaggggga 
gggtggtggc 
cggtcggcac 
cggtggcggc 



cgtcgcccac 
gcgggacggc 
ggccggtgcg 
accactgtgg 
accggcgcag 
cttcggtggg 
ggcggtcctc 
ccgtcgcctg 
cgtcctggtc 
acggggtgcc 
gatccgggcc 
ggcgatcggc 
cggcgtcgcc 
gccgacggtc 
gtcggtggcc 
cctcgacgcc 
gaccccgtgg 
cagcctcgac 
gtcggtggcc 
gccgaccccg 
cgaccggccg 
ggaacagcgg 
cgagaccggc 
gggctcgatg 
gctggt cttc 
cggggactcc 
cgccgtggtc 
gcgggtggtg 
cctccggcgg 
gggattcctc 
ggcgctggcg 
acgggcgggc 
gaacggccag 
ggggttgggc 
ggggccggcg 
catgcagtcg 
ggccgacccg 
gtgcaaggcg 
cgtcctcgaa 
cggcagcgcc 
gcaggaacgg 
catggtggag 
catcgcggcg 
catcggccac 
gcggcacggc 
ggcggacggg 
ccgccgcgcc 
ggaggcaccc 
cttcgtcctg 
acacctgcgc 
tcgcgcccgt 
cgccctcgac 
cgccgcccgt 
ccgtgacctg 
gctgtcgccg 
cccgtacgac 
gcggttgtgg 
gatcgccgcc 
. gttgcgcagc 
ctcccgcgcc 
ggtgaacgga 



ccggtggact 

acgttccggg 

gtcgccctgc 

tgcctgaccc 

gccgccctgc 

gtgctcgacc 

gccggcggcg 

gtcagggcga 

accggcgcgg 

acccgactcg 

gccggtgcca 

ggggagttgc 

gacgccgacc 

ctggcggagg 

ggggtctggg 

ctggtcgagc 

gccctgcccg 

gtggccgacg 

gtcgccgacg 

ctcttcgacg 

ggggagccgg 

gagacgttgc 

accgagatca 

gccctgcgtc 

gaccacccga 

gacccgaccc 

ggcatcggct 

tccgagggca 

ctctaccacc 

gacggggccc 

atggacccgc 

atcgacccgg 

tcctacctgc 

aactcggcga 

ctgacggtgg 

ctgcgtcggg 

tacaccttcg 

ttctccgcgc 

ccgttgtcca 

gtcaaccagg 

gtgatcaggc 

gcgcacggga 

tacggccggg 

acccaggccg 

gtactcccga 

aaggtcgagg 

ggggtgtcct 

gccgaaccgg 

cacggacgca 

accaccggcc 

ttcgacgtcc 

gcgctggcgc 

acccccgtcc 

ctcgactcct 

tacaccgact 

cgggtggacg 

cagtcgtacg 

gcgcacgtgg 

cgggtgctgc 

gagttggact 

cccggcacgc 



ccgtgacctc 

gggtgctgtc 

tgaccctggc 

aggaggcggt 

acggtttcgc 

tgcccgccac 

gcgaggacgt 

ccctgccgcc 

ccggtccggt 

tcctgcccgg 

ccgccgtggt 

cgaccgcgct 

ccgaggactt 

tgctcggcga 

gtggggtcgg 

accgtcgcgc 

gcgcggtcga 

ccctcgggac 

tcgactggtc 

aactcctcga 

cgggcgagtg 

tgaccctcgt 

acacccgtcg 

agcgcctggc 

cggtcaccgc 

cggtacgggt 

gccggttccc 

cctccatcac 

ccgacccgga 

cggacttcga 

agcagcggct 

agaccctcct 

aactgctgac 

gcgtgctctc 

acaccgcctg 

gtgagtgctc 

tggacttcag 

aggccgacgg 

aggcgcggcg 

acggggccag 

aggccctgac 

cgggcaccga 

accgggaccg 

ccgccggtgc 

ggtcgctgca 

tgctccgcga 

ccttcggcgt 

accccgaacc 

gcgtccagac 

accgggacct 

gggccgcagt 

aggatcgccc 

tggtcttccc 

ccgaggtgtt 

gggacctgct 

tgctccagcc 

gggtgactcc 

ctggtgcgtt 

gggagctcga 

cggtcctgcg 

tcgtggtggc 



ccggaccggc 

gtgggtggcg 

gcaggcgttg 

ccgtaccccc 

ccaggtcgcc 

cgtcgacgcc 

cgtcgccgtc 

gcccggcggg 

gggcggtcgg • 

cgcacacccg 

gtgcgaaccg 

cgtacacgcc 

cgccgccacc 

ccaccgcctc 

catggccgcg 

ccgggggcac 

cgacggtcgg 

gtgggaacgt 

ggtcttcaca 

ccggcgcggg 

gggtcgacga 

cggcgagacg 

ggccttcagc 

ggcccgtacc 

gctcgcgcgg 

gttcggcccc 

cggcggcatc 

caccggattc 

ccaccccggc 

ccccgggttc 

caccctggag 

cggcagcgac 

cggggagggt 

cggccgtgtc 

ctcgtcctcg 

gctggcgttg 

cgcacagcgg 

gttcgccctc 

aaacggccac 

caacggcctc 

cgcctccggg 

actcggcgac 

gccgctctgg 

cgccggggtg 

cgccgacgag 

ggcacgacag 

cagcgggacc 

ggttcccgcc 

ggtccggtcc 

cgccgacacc 

gctcggcacc 

ctcgcccgac 

cgggcagggg 

cgccgagtcg 

cgacgtggtc 

ggtgctgttc 

gggtgcggtg 

gtcgttggcc 

cgaccagggc 

ccggtgggac 

cggacccacc 



25800 

25860 

25920 

25980 

26040 

26100 

26160 

26220 

26280 

26340 

26400 

26460 

26520 

26580 

26640 

26700 

26760 

26820 

26880 

26940 

27000 

27060 

27120 

27180 

27240 

27300 

27360 

27420 

27480 

27540 

27600 

27660 

27720 

27780 

27840 

27900 

27960 

28020 

28080 

28140 

28200 

28260 

28320 

28380 

28440 

28500 

28560 

28620 

28680 

28740 

28800 

28860 

28920 

28980 

29040 

29100 

29160 

29220 

29280 

2S340 

29400 



12 



WO 01/27284 



PCT7US00/27433 



gccgaactgg acgagttcct cgcggtggcc gaggcccgcg 
gcggtgcgct acgcgtcgca ctccccggag gtggcccggg 
gaactcggca ccgtcaccgc cgtcggcggc acggtcccgc 
gacctcctcg acaccacagc catggacgcc gggtactggt 
gtgctgttcg agcacgccgt ccgcagcctc ctggagcggg 
gtcagcccgc accctgtgct gctgatggcg gtcgaggaga 
ccggtcaccg gcgtgccgac gctgcgccgc gaccacgacg 
aacctcctgg gggcgcacgt gcacggggtc gacgtcgacc 
ggccgcctgg tcgacctgcc cacctacccc ttcgacaggc 
caccgcaggg ccgacacctc gtcgctgggg gtccgtgact 
gccgcagtcg acgtacccgg tcacggcgga gcggtgttca 
gagcagcagt ggctgaccca gcacgtggtg ggtgggcgga 
ctggtcgacc tcgcgctcac cgccggggcc gacgtcggcg 
gtcctgcagc agccgctggt gttgaccgcc gccggtgcgt 
gccgccgacg aggacgggcg gcggccggtc gagatccacg 
ccggccgagg cccggtggtc ggcgtacgcg accgggaccc 
ggcggccggg acggcacaca gtggcccccg cccggcgcca 
cactacgaca ccctcgccga actgggctac gagtacgggc 
gccgcgtggc agcacggcga cgtggtctac gcggaggtgt 
gggtacgcgt tcgacccggt gctgctcgac gccgtcgccc 
cgcgcccccg ggaagctccc cttcgcctgg cggggcgtca 
actgcggtac gggtggtggc gacccccgcc ggaccggacg 
gacccgaccg gtcagctcgt cgccacggtg gacgccctgg 
gatcgggacc agccgcgcgg ccgcgacggc gacctgcacc 
gccaccccgg acccgacccc ggcggcggtg gtgcacgtgg 
ctgctgcgcg ccggtggtc.c ggcaccacag gccgtcgtcg 
gacgacccga cggccgaggc ccgtcacggg gtgctctggg 
tggctcgacg acgaccggtg gcccgccacc accctggtgg 
gaggtctccc ccggggacga cgtgccgcgc cccggggccg 
cgctgcgccc aggcggagtc cccggaccgc ttcgtgctcg 
cccccggcgg tgccggacaa tccgcagctc gcggtccgtg 
cggctgacgc 'cgctcgccgg tcccgtgccg gccgtcgccg 
cccggcaacg gcggctccat cgaggcagtg gccttcgccc 
cccctggcgc cggaggaggt acgcgtcgcc gtccgcgcca 
gtcctgctcg cgctcggcat gtacccggaa ccggccgaga 
gtggtcaccg aggtcgggtc gggtgtccgg cggttcaccc 
ctgttccagg gggccttcgg qccggtggcg gtcgccgacc 
cccgacgggt ggcgggcggt ggacgccgca gccgtaccca 
tacgcgctgc acgacctggc cgggttgcag gccgggcagt 
gccggcgggg tggggatggc tgccgtcgcg ttggcccgtc 
gccacggcca gcccggccaa acacccgacg ctgcgggcgc 
atcgcctcgt cccgggagag cgggttcggt gagcggttcg 
ggcgtcgacg tggtcctgaa ctcgctcacc ggcgacctgc 
ctcgccgacg gcggggtctt cgtcgagatg ggcaagaccg 
ttccggggcc ggtacgtccc gttcgacctg gccgaggccg 
atcctggagg aggtcgtcgg tctgctggcc gccggtgccc 
gtgtgggagt tgtcggcggc cccggccgcg ctcacccaca 
ggcaagctcg tcctcaccca gcccgccccc gtgcaccccg 
ggcgggaccg gcaccctggg gcggctggtc gcccgccacc 
ccccacctcc tggtggccag ccggcgcggt ccggcggccc 
gccgacgtcg aaggcctcgg cgcgaccatc gagatcgtcg 
gaggcgctcg cggcgctgct cgactcgatc cccgcggacc 
cacaccgccg gggtcctggc cgacgggctg gtcacctcca 
caggtcctgc gggccaaggt cgacgcggcg tggcacctgc 
gacctgagct tcttcgtgct gttctcgtcg gcggcgtcgg 
ggcgtgtacg cggcggccaa cggggtcctc aacgccctgg 
ggactgcccg cgaaggcgct cgggtggggc ctgtgggcgc 
ggcctcggtg accggatcgc ccgtaccggg gtcgccgcgc 
gccctgttcg acgcggctct gcgcagcggc ggggaggtgc 
aggtcggcgc tgcgccgggc cgagtacgtc cccgaggtgc 
acgccacggg ccgccaacag ggccgagacc ccgggccggg 



agatgaggcc 
tcgaacagcg 
tctactccac 
accgcaacct 
gattcgagac 
ccgccgagga 
ggccgtcgga 
tgcgtccggc 
agcggctctg 
cgacccaccc 
ccgggcggct 
acctggtgcc 
tgccggtgct 
tgctgcgcct 
ccgccgagga 
tcgccgtcgg 
ccgccctgac 
cggcgttcca 
ccctcgacgc 
agaccttcgg 
ccctgcacgc 
cggtggccct 
tcgtcaggga 
gcctggagtg 
cggccgacgg 
tccgctaccg 
cggccacgct 
tggccacgtc 
ccgccgtgtg 
tcgacggcga 
acggtgcggt 
accgggcgta 
ccgtccccga 
ccggcgtgaa 
tgggcaccga 
ccggccaggc 
accggctcct 
tcgcgttcac 
ccgtgctggt 
gggccggggc 
tcggcctcga 
ccgcgcgtac 
tcgacgagtc 
acctgcggcc 
gtcccgatcg 
tcgaccggtt 
tgagccgggg 
acggaacggt 
tggtgaccgg 
cgggcgcggc 
cctgcgacac 
gtccgctgac 
tcgacgggac 
acgacctgac 
tgctggccgg 
ccgggcaacg 
aggccagcga 
tgccgaccga 
tgttcccgct 
tgcgcggcgc 
gcctgctcga 



gcgtcggatc 
gctcgccgcc 
cgccaccggg 
gcgccaaccg 
gttcatcgag 
cgccgagcgc 
gttcctccgc 
ggtcgcccac 
gcccaagccg 
gctgctgcac 
ctcccccgac 
cggcagtgtc 
ggaggaactc 
gtcggtcggc 
cgtctccgac 
cgtggccggc 
gttgaccgac 
ggcgctgcgc 
cgtcgaggag 
cctgaccagt 
caccggggcc 
gcgggtcacc 
cgccggggcg 
ggtacggctg 
gctcgacgac 
tcccgacggc 
cgtgcgccgt 
cgcaggggtc 
gggggtgctg 
cccggagacg 
gttcgtgcca 
ccggctggtg 
cgccgaccgg 
cttccgtgac 
ggcgtccggt 
ggtgacgggc 
caccccggtc 
caccgcccac 
ccacgccgcc 
ggaggtgttc 
cgacgaccac 
cggggggcgg 
cgcgcggctg 
ggcggagcag 
gctcggcgag 
gccggtgtcg 
ccgacacgtg 
gctggtcacc 
gcacggcgta 
cgagctgcgc 
cgccgaccgg 
cggggtggtg 
cgccaccgat 
ccgggacgcg 
tcccgggcag 
gcgggccctc 
gatgaccagc 
gcgggcgctg 
gtctgtcgac 
ggtccggtcc 
ccgtctcgtc 
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31500 

31560 

31620 

31680 

31740 

31800 

31860 

31920 
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32040 
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ggtgcacccg agaccgatca ggtggccgcg 
gcggtcgccg gctacgactc ggccgaccag 
gggttcgact cgctggcggc ggtggagctg 
cggctgccca gcacgctggt gttcgaccac 
cggtcggagt tgttcgccga ctccgcgccg 
ctggaacggg cgctcgacgc cctgcccgac 
ctggaggcgc tgctgcgccg gtggcagagc 
atcagtgacg acgccagtga cgacgagctg 
ggaggggacg tctaggtgac aggtcgattc 
acaggtccac cgggttcgcg tcgcctccca 
atccgatgag cgagagcagc ggcatgaccg 
ccgtcgccga actcgactcg gtgacaggtc 
aaccgatcgc cgtcgtcggc atggcctgcc 
cgttctggga gttcatccgc gacggtggtg 
gctggccgcc ggcaccgcga ccccgcctcg 
acgccgcctt cttcggcatc tcaccccgcg 
tgatgctgga gatctcctgg gaggcgttgg 
gcggcagcgc cggtggcgtc ttcaccggtg 
acgaggcacc cgaggaggtg ctcggctacg 
ccggacgggt ggcgtacacc ctggggttgg 
gctcctccgg gctcaccgcg gtgcacctgg 
ccctggtcct cgccggtggg gtcaccgtga 
gcagccaggg cgggttggcc gaggacggcc 
gcttcgggct cgccgagggg gccggggtcc 
ccgagggccg gccggtgctg gccgtactgc 
gcaacgggct caccgcgccg agcggccccg 
agcgggcgcg gctgcgtccc gtcgacgtgg 
ggctgggcga tccgatcgag gcgcacgccc 
ccggccgccc gctctgggtc ggatcggtga 

£99999^99° c 9999 t 9 at 9 aagaccgtgc 
cgttgcactt cgacgagccc tcgccgcacg 
tgtccgagac ccggccctgg ccggtggggg 
tcggcatcag cggcaccaac gcgcacgtca 
ccgacctcga cccgaccccc ggcccggcaa 
ccaccgccga gccgggtgcg gaggcggtcg 
ccctgcgcgc ccaggcggcc cggctcgccg 
tgcgcgacac cgccttcacc ctggtcaccc 
tcgtcggcgg gggcgaggag gtcctcgccg 
tcgacggagc cgtcagcggg cgggcgcgcg 
ggcagggcgc acagtggcag ggcatggccc 
cggagtccat cgacgcctgc gagcgggcgc 
aggtgctcga cggcgagcag tcgttggacc 
cggtgatggt gtcgttggcg cggttgtggc 
tgggtcactc gcagggggag atcgccgccg 
acgccgccag ggtggtggcg ttgcgcagcc 
ggatggcgtc gttcgggctc caccccgacc 
gtgcgctgac tgtcgcctcg gtcaacggtc 
gcccgttgga cgagctgatc gccgagtgcg 
ccgtcgacta cgcctcacac tccccgcagg 
cactggccgg ggtccgtccg gtgtcggccg 
aggtcatcga aacggcgacg atggacgccg 
tgcgcttcca ggacgccacc aggcagctcg 
tcagcccgca cccggtgttg acagtcggtg 
ccgacgcgga tccgtgtgtc acaggcaccc 
tccacaccgc gctcgccgag gcgtacaccc 
tgggtgaggg acgcccggtc gacctgccgg 
tcccggtccc cctgggccgg gtccccgaca 
ggcaccccgt cgacctcggg cggtcctccc 
cggcagtacc cccggcctgg acggacgtgg 
ccgtcgtgtt gtgcaccgcg cagtcgcgcg 
acggcaccgc cctgtccact gtggtctctc 



ctggccgagc tggtccgctc 
ctgcccgaac gcaaggcgtt 
cgcaaccggc tcggcgtcac 
ccgacaccgc tggeggtggc 
gacgtcgggg tcggtgcgcg 
gcgcagggac acgccgacgt 
cgacgacccc cggagaccga 
ttctcgatgc tcgacaggcg 
cgccccgcgg cagtggaccg 
cacccgacgg ccggggtatc 
aggaccgcct ccggcgctat 
ggctcgacga ggtcgagtac 
ggttccccgg gggtgtggac 
acgcgatcgc cgaggcgccc 
gtggtctcct cgcggagccg 
aggcgctcgc gacggacccc 
agcgtgcggg tttcgacccg 
tcggtgcggt ggactacgga 
tcggcatcgg caccgcctcc 
agggtccagc cgtcaccgtc 
cgatggagtc gctgcgccgc 
tgagcagccc gggtgcgttc 
gctgcaaacc gttctcccgc 
tggtgctcca acggctgtcc 
gtggctcggc gatcaaccag 
cccagcggcg ggtgatcagg 
actacgtgga ggcccacggc 
tgctcgacac gtacggtgcc 
agtccaacat cggtcacacc 
tggcgctgcg gcatcgggag 
tcgactggga ccggggtgcg 
agcgcccgcg ccgggcgggg 
tcgtcgagga ggcgccgagc 
ccggagcgac ccccggaacg 
cactggtgtt ctccgcgcgc 
accgtctcac cgacgacccg 
gccgtgccac ctgggagcat 
gcctccgggc cgtcgccggg 
ccggccgccg ggtggtgctg 
gggacctgct gcggcagtcg 
tcgccccgca cgtggactgg 
ccgtcgacgt ggtgcagccg 
agtcgtacgg ggtgactccg 
cgcacgtggc tggtgcgttg 
gggtgctgcg ccgtctcggt 
aggccgccga gcggatcgcg 
cccgttcggt ggtgctggcc 
aggccgaggg cgtgaccgcc 
tggagtcgct gcgtgaggag 
ggatccccct gtactcgacc 
actactggtt cgccaacctc 
ccgaggcggg gttcgacgcc 
tcgaggccac cctcgaggca 
tgcgccgcga acgcggcggt 
ggggggtgga ggtcgactgg 
tctacccgtt ccaacgacag 
ccggcgacga gtggcgttac 
tggccggacg ggtcctggtg 
tccgcgacgg cctggaacag 
cccggatcgg cgccgcactc 
tgctcgcgct cgccgagggc 



gcacgcggcg 

caaggacctc 

caccggcgta 

cgaacacctg 

cctcgacgac 

cggggcccgc 

gccagtgacg 

tctcggcggg 

taccgccctg 

cacggaaggg 

ctcaagcgca 

cgggcccgcg 

tcgccggagg 

acggaccgtg 

ggcgcgttcg 

cagcagcgcc 

tcgagcctgc 

cccaggccgg 

agcgtcgcct 

gacaccgcct 

gacgagtgca 

accgagttcc 

gccgccgacg 

gtcgcccggg 

gacggtgcca 

ca ggcgttgg 

accggcaccc 

gaccgggaac 

caggcggcgg 

atcccggcga 

gtgtcggtgg 

gtgtcctcgt 

ccgcaggcgg 

gatgccgccc 

gacgagcggg 

gccccctcgt 

cgggcggtcg 

ggacgtcccg 

gtcttccccg 

ccgaccttcg 

tcgctgcgcg 

gtgctgttcg 

ggtgcggtgg 

tcgttggccg 

ggtcacggcg 

cgcttcgcgg 

ggggagaacg 

cgtcggatcc 

ctgctcgccg 

ctgaccggtc 

cgggagccgg 

ttcgtcgagg 

gtgctgcccc 

ctcgcgcagt 

cgtaccgcag 

aacttctggc 

cagctcgcct 

gtgaccggag 

cgcggggcga 

gacgccgtcg 

ggtgctgtcg 



33120 

33180 

33240 

33300 

33360 

33420 ' 

33480 

33540 

33600 

33660 

33720 

33780 

33840 

33900 

33960 

34020 

34080 

34140 

34200 

34260 

34320 

34380 

34440 

34500 

34560 

34620 

34680 

34740 

34800 

34860 

34920 

34980 

35040 

35100 

35160 

35220 

35280 

35340 

35400 

35460 

35520 

35580 

35640 

35700 

35760 

35820 

35880 

35940 

36000 

36060 

36120 

36180 

36240 

36300 

36360 

36420 

36480 

36540 

36600 
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acgaccccag cctggacacc ctcgcgttgg tccaggcgct cggcgcagcc gggatcgacg 
tccccctgtg gctggtgacc agggacgccg ccgccgtgac cgtcggagac gacgtcgatc 
cggcccaggc catggtcggt gggctcggcc gggtggtggg cgtggagtcc cccgcccggt 
ggggtggcct ggtggaectg cgcgaggccg acgccgactc ggcccggtcg ctggccgcca 
tactggccga cccgcgcggc gaggagcagt tcgcgatccg gcccgacggc gtcaccgtcg 
cccgtctcgt cccggcaccg gcccgcgcgg cgggtacccg gtggacgccg cgcgggaccg 
tcctggtcac cggcggcacc ggcggcatcg gcgcgcacct ggcccgctgg ctcgccggtg 
cgggcgccga gcacctggtg ctgctcaaca ggcggggagc ggaggcggcc ggtgccgccg 
acctgcgtga cgaactggtc gcgctcggca cgggagtcac catcacggcc tgcgacgtcg 
ccgaccgcga ccggttggcg gccgtcctcg acgccgcacg ggcgcaggga* cgggtggtca 
cggcggtgtt ccacgccgcc gggatctccc ggtccacagc ggtacaggag ctgaccgaga 
gcgagttcac cgagatcacc gacgcgaagg tgcggggtac ggcgaacctg gccgaactct 
gtcccgagct ggacgccctc gtgctgttct cctcgaacgc ggcggtgtgg ggcagcccgg 
ggctggcctc ctacgcggcg ggcaacgcct tcctcgacgc cttcgcccgt cgtggtcggc 
gcagtgggct gccggtcacc tcgatcgcct ggggtctgtg ggccgggcag aacatggccg 
gtaccgaggg cggcgactac ctgcgcagcc agggcctgcg cgccatggac ccgcagcggg 
cgatcgagga gctgcggacc accctggacg ccggggaccc gtgggtgtcg gtggtggacc 
tggaccggga gcggttcgtc gaactgttca ccgccgcccg ccgccggccc ctcttcgacg 
aactcggtgg ggtccgcgcc ggggccgagg agaccggtca ggaatcggat ctcgcccggc 
ggctggcgtc gatgccggag gccgaacgtc acgagcatgt cgcccggctg gtccgagccg 
aggtggcagc ggtgctgggc cacggcacgc cgacggtgat cgagcgtgac gtcgccttcc 
gtgacctggg attcgactcc atgaccgccg tcgacctgcg gaaccggctc gcggcggtga 
ccggggtccg ggtggccacg accatcgtct tcgaccaccc gacagtggac cgcctcaccg 
cgcactacct ggaacgactc gtcggtgagc cggaggcgac gaccccggct gcggcggtcg 
tcccgcaggc acccggggag gccgacgagc cgatcgcgat cgtcgggatg gcctgccgcc 
tcgccggtgg agtgcgtacc cccgaccagt tgtgggactt catcgtcgcc gacggcgacg 
cggtcaccga gatgccgtcg gaccggtcct gggacctcga cgcgctgttc gacccggacc 
ccgagcggca cggcaccagc tactcccggc acggcgcgtt cctggacggg gcggccgact 
tcgacgcggc gttcttcggg atctcgccgc gtgaggcgtt ggcgatggat ccgcagcagc 
ggcaggtcct ggagacgacg tgggagctgt tcgagaacgc cggcatcgac ccgcactccc 
tgcgcggtac ggacaccggt gtcttcctcg gcgctgcgta ccaggggtac ggccagaacg 
cgcaggtgcc gaaggagagt gagggttacc tgctcaccgg tggttcctcg gcggtcgcct 
ccggtcggat cgcgtacgtg ttggggttgg aggggccggc gatcactgtg gacacggcgt 
gttcgtcgtc gcttgtggcg ttgcacgtgg cggccgggtc gctgcgatcg ggtgactgtg 
ggctcgcggt ggcgggtggg gtgtcggtga tggccggtcc ggaggtgttc accgagttct 
ccaggcaggg cgcgctggcc cccgacggtc ggtgcaagcc cttctccgac caggccgacg 
ggttcggatt cgccgagggc gtcgctgtgg tgctcctgca gcggttgtcg gtggcggtgc 
gggaggggcg tcgggtgttg ggtgtggtgg tgggttcggc ggtgaatcag gatggggcga 
gtaatgggtt ggcggcgccg tcgggggtgg cgcagcagcg ggtgattcgg cgggcgtggg 
gtcgtgcggg tgtgtcgggt ggggatgtgg gtgtggtgga ggcgcatggg acggggacgc 
ggttggggga tccggtggag ttgggggcgt tgttggggac gtatggggtg ggtcggggtg 
gggtgggtcc ggtggtggtg ggttcggtga aggcgaatgt gggtcatgtg caggcggcgg 
cgggtgtggt gggtgtgatc aaggtggtgt tggggttggg tcgggggttg gtgggtccga 
tggtgtgtcg gggtgggttg tcggggttgg tggattggtc gtcgggtggg ttggtggtgg 
cggatggggt gcgggggtgg ccggtgggtg tggatggggt gcgtcggggt ggggtgtcgg 
cgtttggggt gtcggggacg aatgctcatg tggtggtggc ggaggcgccg gggtcggtgg 
tgggggcgga acggccggtg gaggggtcgt cgcgggggtt ggtgggggtg gctggtggtg 
tggtgccggt ggtgctgtcg gcaaagaccg aaaccgccct gaccgagctc gcccgacgac 
tgcacgacgc cgtcgacgac accgtcgccc tcccggcggt ggccgccacc ctcgccaccg 
gacgcgccca cctgccctac cgggccgccc tgctggcccg cgaccacgac gaactgcgcg 
acaggctgcg ggcgttcacc actggttcgg cggctcccgg tgtggtgtcg ggggtggcgt 
cgggtggtgg tgtggtgttt gtttttcctg gtcagggtgg tcagtgggtg gggatggcgc 
gggggttgtt gtcggttccg gtgtttgtgg agtcggtggt ggagtgtgat gcggtggtgt 
cgtcggtggt ggggttttcg gtgttggggg tgttggaggg tcggtcgggt gcgccgtcgt 
tggatcgggt ggatgtggtg cagccggtgt tgttcgtggt gatggtgtcg ttggcgcggt 
tgtggcggtg gtgtggggtt gtgcctgcgg cggtggtggg tcattcgcag ggggagatcg 
cggcggcggt ggtggcgggg gtgttgtcgg tgggtgatgg tgcgcgggtg gtggcgttgc 
gggcgcgggc gttgcgggcg ttggccggcc acggcggcat ggtctccctc gcggtctccg 
ccgaacgcgc ccgggagctg atcgcaccct ggtccgaccg gatctcggtg gcggcggtca 
actccccgac ctcggtggtg gtctcgggtg acccacaggc cctcgccgcc ctcgtcgccc 
actgcgccga gaccggtgag cgggccaaga cgctgcctgt ggactacgcc tcccactccg 
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cccacgtcga acagatccgc gacacgatcc tcaccgacct ggccgacgtc acggcgcgcc 4 04 4 0 

gacccgacgt cgccctctac tccacgctgc acggcgcccg gggcgccggc acggacatgg 40500 

acgcccggta ctggtacgac aacctgcgct caccggtgcg cttcgacgag gccgtcgagg 40560 

ccgccgtcgc cgacggctac cgggtcttcg tcgagatgag cccacacccg gtcctcaccg 40620 

ccgcggtgca ggagatcgac gacgagacgg tggccatcgg ctcgctgcac cgggacaccg 40680 

gegagcggca cctggtcgcc gaactcgccc gggcccacgt gcacggcgta ccagtggact 40740 

ggcgggcgat cctccccgcc acccacccgg ttcccctgcc gaactacccg ttcgaggcga 40800 

cccggtactg gctcgccccg acggcggccg accaggtcgc cgaccaccgc taccgcgtcg 40860 

actggcggcc cctggccacc accccggcgg agctgtccgg cagctacctc gtcttcggcg 4 0920 

acgccccgga gaccctcggc cacagcgtcg agaaggccgg cgggctcctc gtcccggtgg 40980 

ccgctcccga ccgggagtcc ctcgcggtcg ccctggacga ggcggccgga cgactcgccg 41040 

gtgtgctctc cttcgccgcc gacaccgcca cccacctggc ccggcaccga ctcctcggcg 41100 

aggccgacgt cgaggcccca ctctggctgg tcaccagcgg cggcgtcgca ctcgacgacc 41160 

acgacccgat cgactgcgac caggcaatgg tgtgggggat cggacgggtg atgggtctgg 41220 

agaccccgca ccggtggggc ggcctggtgg acgtgaccgt cgaacccacc gccgaggacg 41280 

gggtggtctt cgccgccctc ctggccgccg acgaccacga ggaccaggtg gcgctgcgcg 4134 0 

acggcatccg ccacggccga cggctcgtcc gcgccccgct gaccacccga aacgccaggt 41400 

ggacaccggc gggcacggcg ctcgtcacgg gcggtacggg tgccctcggc ggccacgtcg 414 60 

cgcggtacct ggcccggtcc ggggtgaccg atctcgtcct gctcagcagg agcggccccg 41520 

acgcacccgg tgccgccgaa ctggccgccg aactggccga cctcggggcc gagccgagag 41580 

tcgaggcgtg cgacgtcacc gacgggccac gcctgcgcgc cctggtgcag gagctacggg 41640 

aacaggaccg gccggtccgg atcgtcgtcc acaccgcagg ggtgcccgac tcccgtcccc 41700 

tcgaccggat cgacgaactg gagtcggtca gcgccgcgaa ggtgaccggg gcgcggctgc 417 60 

tcgacgagct ctgcccggac gccgacacct tcgtcctgtt ctcctcgggg gcgggagtgt 41820 

ggggtagcgc gaacctgggc gcgtacgcgg cagccaacgc ctacctggac gccctggccc 41880 

accgccgccg ccaggcgggc cgggccgcga cctcggtcgc ctggggggcg tgggccggcg 4194 0 

acggcatggc caccggcgac ctcgacgggc tgacccggcg cggtctgcgg gcgatggcac 4 2000 

cggaccgggc gctgcgcgcc tgcaccaggc gttggaccac ccacgacacc tgtgtgtcgg 42060 

tagccgacgt cgactgggac cgcttcgccg tgggtttcac cgccgcccgg cccagacccc 42120 

tgatcgacga actcgtcacc tccgcgccgg tggccgcccc caccgctgcg gcggccccgg 4 2180 

tcccggcgat gaccgccgac cagctactcc agttcacgcg ctcgcacgtg gccgcgatcc 42240 

tcggtcacca 'ggacccggac gcggtcgggt tggaccagcc cttcaccgag ctgggcttcg 42300 

acccgctcac cgccgtcggc ctgcgcaacc agctccagca ggccaccggg cggacgctgc 4 2360 

ccgccgccct ggtgttccag caccccacgg tacgcagact cgccgaccac ctcgcgcagc 42420 

agctcgacgt cggcaccgcc ccggtcgagg cgacgggcag cgtcctgcgg gacggctacc 424 80 

ggcgggccgg gcagaccggc gacgtccggt cgtacctgga cctgctggcg aacctgtcgg 42540 

a^ttccggga gcggttcacc gacgcggcga gcctgggcgg acagctggaa ctcgtcgacc 42600 

tgqccgacqg atccggcccg gtcactgtga tctgttgcgc gggcactgcg gcgctctccg 42660 

ggccgcacga gttcgcccga ctcgcctcgg cgctgcgcgg caccgtgccg gtgcgcgccc 4 2720 

tcgcgcaacc cgggtacgag gcgggtgaac cggtgccggc gtcgatggag gcagtgctcg 42780 

gggtgcaggc ggacgcggtc ctcgcggcac agggcgacac gccgttcgtg ctggtcggac 42840 

actcggcggg ggccctgatg gcgtacgccc tggcgaccga gctggccgac cggggccacc 42900 

cgccacgtgg cgtcgtgctc ctcgacgtgt acccacccgg tcaccaggag gcggtgcacg 42960 

cctggctcgg cgagctgacc gccgccctgt tcgaccacga gaccgtacgg atggacgaca 4 3020 

cccggctcac ggccctgggg gcgtacgaca ggctgaccgg caggtggcgt ccgagggaca 4 3080 

ccggtctgcc cacgctggtg gtggccgcca gcgagccgat gggggagtgg ccggacgacg 43140 

gttggcagtc cacgtggccg ttcgggcacg acagggtcac ggtgcccggt gaccacttct 4 3200 

cgutggtgca ggagcacgcc gacgcgatcg cgcggcacat cgacgcctgg ttgagcgggg 4 3260 

agagggcatg aacacgaccg atcgcgccgt gctgggccga cgactccaga tgatccgggg 4 3320 

actgtactgg ggttacggca gcaacggaga cccgtacccg atgctgttgt gcgggcacga 4 3380 

cgacgacccg caccgctggt accgggggct gggcggatcc ggggtccggc gcagccgtac 4 34 40 

cgagacgtgg gtggtgaccg accacgccac -cgccgtgcgg gtgctcgacg' acccgacctt 43500 

cacccgggcc accggccgga cgccggagtg gatgcgggcc gcgggcgccc cggcctcgac 4 3560 

ctgggcgcag ccgttccgtg acgtgcacgc cgcgtcctgg gacgccgaac tgcccgaccc 43620 

gcaggaggtg gaggaccggc tgacgggtct cctgcctgcc ccggggaccc gcctggacct 4 3680 

ggtccgcgac ctcgcctggc cgatggcgtc gcggggggtc ggcgcggacg accccgacgt 4 37 40 

gctgcgcgcc gcgtgggacg cccgggtcgg cctcgacgcc cagctcaccc cgcagcccct 4 3800 

ggcggtgacc gaggcggcga tcgccgcggt gcccggggac ccgcaccggc gggcgctgtt 43860 

caccgccgtc gagatgacag ccaccgcgtt cgtcgacgcg gtgctggcgg tgaccgccac 43920 

ggcgggggcg gcccagcgtc tcgccgacga ccccgacgtc gccgcccgtc tcgtcgcgga 49980 

ggtgctgcgc ctgcatccga cggcgcacct ggaacggcgt accgccggca ccgagacggt 4 4040 



16 



ISDOCIO: <WO 



0127284A3 IA> 



WO 01/27284 



PCT7US00/27433 



ggtgggcgag cacacggtcg cggcgggcga cgaggtcgtc gtggtggtcg ccgccgccaa 4 4100 

ccgtgacgcg ggggtcttcg ccgacccgga ccgcctcgac ccggaccggg ccgacgccga 4 4160 

ccgggccctg tccgcccagc gcggtcaccc cggccggttg gaggagctgg tggtggtcct 4 4220 

gaccaccgcc gcactgcgca gcgtcgccaa ggcgctgccc ggtctcaccg ccggtggccc 4 4 280 

ggtcgtcagg cgacgtcgtt caccggtcct gcgagccacc gcccactgcc cggtcgaact 4 4 340 

ctgaggtgcc tgcgatgcgc gtcgtcttct cctccatggc cagcaagagc cacctgttcg 4 4 400 

gtctcgttcc cctcgcctgg gccttccgcg cggcgggcca cgaggtacgg gtcgtcgcct 4 4460 

caccggctct caccgacgac atcacggcgg ccggactgac ggccgtaccg gtcggcaccg 4 4 520 

acgtcgacct tgtcgacttc atgacccacg ccgggtacga catcatcgac tacgtccgca 4 4 580 

gcctggactt cagcgagcgg gacccggcca cctccacctg ggaccacctg ctcggcatgc 4 4 640 

agaccgtcct caccccgacc ttctacgccc tgatgagccc ggactcgctg gtcgagggca 44700 

tgatctcctt ctgtcggtcg tggcgacccg actggtcgtc tggaccgcag accttcgccg 4 4760 

cgtcgatcgc ggcgacggtg accggcgtgg cccacgcccg actcctgtgg ggacccgaca 4 4 820 

tcacggtacg ggcccggcag aagttcctcg ggctgctgcc cggacagccc gccgcccacc 4 4 880 

gggaggaccc cctcgccgag tggctcacct ggtctgtgga gaggttcggc ggccgggtgc 4 4 940 

cgcaggacgt cgaggagctg gtggtcgggc agtggacgat cgaccccgcc ccggtcggga 45000 

tgcgcctcga caccgggctg aggacggtgg gcatgcgcta cgtcgactac aacggcccgt 4 5060 

cggtggtgcc ggactggctg cacgacgagc cgacccgccg acgggtctgc ctcaccctgg 4 5120 

gcatctccag ccgggagaac agcatcgggc aggtctccgt cgacgacctg ttgggtgcgc 4 5180 

tcggtgacgt cgacgccgag atcatcgcga cagtggacga gcagcagctc gaaggcgtcg 4 5240 

cccacgtccc ggccaacatc cgtacggtcg ggttcgtccc gatgcacgca ctgctgccga 4 5300 

cctgcgcggc gacggtgcac cacggcggtc ccggcagctg gcacaccgcc gccatccacg 4 5360 

gcgtgccgca ggtgatcctg cccgacggct gggacaccgg ggtccgcgcc cagcggaccg 4 5420 

aggaccaggg ggcgggcatc gccctgccgg tgcccgagct gacctccgac cagctccgcg 45480 

aggcggtgcg gcgggtcctg gacgatcccg ccttcaccgc cggtgcggcg cggatgcggg 45540 

ccgacatgct cgccgagccg tcccccgccg aggtcgtcga cgtctgtgcg gggctggtcg 4 5 600 

gggaacggac cgccgtcgga tgagcaccga cgccacccac gtccggctcg gccggtgcgc 4 5 660 

cctgctgacc agccggctct ggctgggtac ggcagccctc gccggccagg acgacgccga 45720 

cgcagtacgc ctgctcgacc acgcccgttc ccggggcgtc aactgcctcg acaccgccga 45780 

cgacgactct gcgtcgacca gtgcccaggt cgccgaggag tcggtcggcc ggtggttggc 4584 0 

cggggacacc ggtcggcggg aggagaccgt cctgtcggtg acggtgggtg tcccaccggg 4 5900 

cgggcaggtc ggcgggggcg gcctctccgc ccggcagatc atcgcctcct gtgagggctc 45960 

cctgcggcgt ctcggtgtcg accacgtcga cgtccttcac ctgccccggg tggaccgggt 4 6020 

ggagccgtgg gacgaggtct ggcaggcggt ggacgccctc gtggccgccg gaaaggtctg 4 6080 

ttacgtcggg tcgtcgggct tccccggatg gcacatcgtc gccgcccagg agcacgccgt 4 614 0 

ccgccgtcac cgcctcggcc tggtgtccca ccagtgtcgg tacgacctga cgtcgcgcca 46200 

tcccgaactg gaggtcctgc ccgccgcgca ggcgtacggg ctcggggtct tcgccaggcc 4 6260 

gacccgcctc ggcggtctgc tcggcggcga cggtccgggc gccgcagccg cacgggcgtc 4 6320 

gggacagccg acggcactgc gctcggcggt ggaggcgtac gaggtgttct gcagagacct 4 6380 

cggcgagcac cccgccgagg tcgcactggc gtgggtgctg tcccggcccg gtgtggcggg 4 64 40 

ggcggtcgtc ggtgcgcgga cgcccggacg gctcgactcc gcgctccgcg cctgcggcgt 4 6500 

cgccctcggc gcgacggaac tcaccgccct ggacgggatc ttccccgggg tcgccgcagc 4 6560 
aggggcggcc ccggaggcgt ggctacggtg agagcccgcc cctgacctgc -gggaacccgt 4 6620 
gtcggtgcgg cgggacggcc gccgcggtcc ccgccccggt cagccggtgg gggtgagccg 4 6680 
cagcaggtcc ggcgccaccg actcggccac ctccccgacg tggtcggcga ggtagaagtg 4 6740 
cccgcccggg aaggtccggg tacggccggg gactaccgag tacggcagcc agcgttgggc 4 6800 
gtcctccacc gtcgtcaacg ggtcggtgtc accgcagagg gtggtgatgc cggcccgcag 4 6860 
cggcggcccg gcctgccagg cgtaggagcg cagcacccgg tggtcggccc gcagcaccgg 4 6920 
cagcgacatg tccaacagcc cctggtcggc caatgcggcc tcgctgaccc cgagcctgcg 46980 
catctgctcg acgagtccgt cctcgtcggg caggtcggtg cgccgctcgt ggacccgggg 47040 
ggcggtctgc ccggagacga acaaccgcag cggtcgcacc cccggacgag cctccaggcg 47100 
acgggcggtc tcgtaggcga ccagggcgcc catgctgtga ccgaacaggg cgaacggaac 47160 
ctcgccgacg aggtcgcgca gcacggccgc gacctcgtcg gcgatctccc cggcggtgcc 47220 
gagagcccgc tcgtcacgtc ggtcctgccg gcccgggtac tgcaccgccc acacgtcgac 47280 
ctccggggcc agtgcccggg cgaggtcgag gtacgagtcg gcggcggctc ccgcgtgcgg 47340 
gaagcagtac agccgggccc ggtgtccgtc ggcggacccg aaccgccgca accaggtgtt 47400 
catcggtgtc tcatccgttc ggtcgcaccg gcaggtggtc gatgccgcgc agcaggagcg 474 60 
accgccgcca gacaacctcg tcggagggga agcccagcga cagcttcggg aagcggtcga 47520 
acagggcccc cagggcgacc tctccctcca gcttggccag cgggcggccc atgcagtagt 47580 
ggatgccgtg cccgaaggtg aggtgtcccc ggctgtccct ggtgacgtcg aaccggtcgg 4 "£640 
ggtcggggaa ctgtcccggg tcgcggttgg ccgccccgtt ggcgatcagg acggtgctgt 47700 
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acgccgggat cgtcaccccg ccgatctcca cctcggcggt ggcgaaccgg gtggtggtct 4 7760 

ccggtggggc ctggtagcgc aggatctcct ccaccgctcc gggcagcagt gccgggtcct 47820 

tccggaccag cgcgagctgg tcggggtggg tcagcagcag gtaggtgccg atcccgatga 4 7880 

ggctcaccga cgcctcgaat cccgccagca gcagcaccag cgcgatggag gtgagttcgt 47940 

cgcggctgag ccggtcggcg tcgtcgtcct ggacccggat c 47981 

<210> 2 
<211> 48 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 2 

Met Gly Asp Arg Val Asn Gly His Ala Thr Pro Glu Ser Thr Gin Ser 

15 10 15 

Ala lie Arg Phe Leu Thr Arg His Gly Gly Pro Pro Thr Ala Thr Asp 

20 25 30 

Asp Val His Asp Trp Leu Ala His Arg Ala Ala Glu His Arg Leu Glu 
35 4 0 4 5 

<210> 3 
<211> 377 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 3 

Met Ala Val Gly Asp Arg Arg Arg Leu Gly Arg Glu Leu Gin Met Ala 

15 10 15 

Arg Gly Leu Tyr Trp Gly Phe Gly Ala Asn Gly Asp Leu Tyr Ser Met 

20 25 30 

Leu Leu Ser Gly Arg Asp Asp Asp Pro Trp Thr Trp Tyr Glu Arg Leu 

35 40 45 

Arg Ala Ala Gly Arg Gly Pro Tyr Ala Ser Arg Ala Gly Thr Trp Val 

50 55 60 

Val Gly Asp His Arg Thr Ala Ala Glu Val Leu Ala Asp Pro Gly Phe 
65 70 75 80 

Thr His Gly Pro Pro Asp Ala Ala Arg Trp Met Gin Val Ala His Cys 

85 90 95 

Pro Ala Ala Ser Trp Ala Gly Pro Phe Arg Glu Phe Tyr Ala Arg Thr 

100 105 110 

Glu Asp Ala Ala Ser Val Thr Val Asp Ala Asp Trp Leu Gin Gin Arg 

115 120 125 

Cys Ala Arg Leu Val Thr Glu Leu Gly Ser Arg Phe Asp Leu Val Asn 

130 135 140 

Asp Phe Ala Arg Glu Val Pro Val Leu Ala Leu Gly Thr Ala Pro Ala 
145 150 155 160 

Leu Lys Gly Val Asp Pro Asp Arg Leu Arg Ser Trp Thr Ser Ala Thr 

. 165 170 175 

Arg Val Cys Leu Asp Ala Gin Val Ser Pro Gin Gin Leu Ala Val Thr 

180 185 190 

Glu Gin Ala Leu Thr Ala Leu Asp Glu lie Asp Ala Val Thr Gly Gly 

195 200 205 

Arg Asp Ala Ala Val Leu Val Gly Val Val Ala Glu Leu Ala Ala Asn 

210 215 220 

Thr Val Gly Asn Ala Val Leu Ala Val Thr Glu Leu Pro Glu Leu Ala 
225 230 235 240 

Ala Arg Leu Ala Asp Asp Pro Glu Thr Ala Thr Arg Val Val Thr Glu 

245 250 255 

Val Ser Arg Thr Ser Pro Gly Val His Leu Glu Arg Arg Thr Ala Ala 

260 265 270 

Ser Asp- Arg Arg Val Gly Gly Val Asp Val Pro Thr Gly Gly Glu Val 
275 280 285 
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Thr Val Val Val Ala Ala Ala Asn Arg Asp Pro Glu Val Phe Thr Asp 

290 295 300 

Pro Asp Arg Phe Asp Val Asp Arg Gly Gly Asp Ala Glu lie Leu Ser 
305 310 315 320 

Ser Arg Pro Gly Ser Pro Arg Thr Asp Leu Asp Ala Leu Val Ala Thr 

325 330 335 

Leu Ala Thr Ala Ala Leu Arg Ala Ala Ala Pro Val Leu Pro Arg Leu 

340 345 350 

Ser Arg Ser Gly Pro Val lie Arg Arg Arg Arg Ser Pro Val Ala Arg 

355 360 365 

Gly Leu Ser Arg Cys Pro Val Glu Leu 
370 375 

<210> 4 • 
<211> 436 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 4 

Met Arg Val Val Phe Ser Ser Met Ala Val Asn Ser His Leu Phe Gly 

1 5 10 15 

Leu Val Pro Leu Ala Ser Ala Phe Gin Ala Ala Gly His Glu Val Arg 

20 25 30 

Val Val Ala Ser Pro Ala Leu Thr Asp Asp Val Thr Gly Ala Gly Leu 

35 40 45 

Thr Ala Val Pro Val Gly Asp Asp Val Glu Leu Val Glu Trp His Ala 

50 55 60 

His Ala Gly Gin Asp lie Val Glu Tyr Met Arg Thr Leu Asp Trp Val 
65 70 75 80 

Asp Gin Ser His Thr Thr Met Ser Trp Asp Asp Leu Leu Gly Met Gin 

85 90 95 

Thr Thr Phe* Thr Pro Thr Phe Phe Ala Leu Met Ser Pro Asp Ser Leu 

100 105 110 

lie Asp Gly Met Val Glu Phe Cys Arg Ser Trp Arg Pro Asp Trp lie 

115 120 125 

Val Trp Glu Pro Leu Thr Phe Ala Ala Pro lie Ala Ala Arg Val Thr 

130 135 140 

Gly Thr Pro His Ala Arg Met Leu Trp Gly Pro Asp Val Ala Thr Arg 
145 150 155 160 

Ala Arg Gin Ser Phe Leu Arg Leu Leu Ala His Gin Glu Val Glu His 

165 170 175 

Arg Glu Asp Pro Leu Ala Glu Trp Phe Asp Trp Thr Leu Arg Arg Phe 

180 185 190 

Gly Asp Asp Pro His Leu Ser Phe Asp Glu Glu Leu Val Leu Gly Gin 

195 200 205 

Trp Thr Val Asp Pro lie Pro Glu Pro Leu Arg lie Asp Thr Gly Val 

210 215 220 

Arg Thr Val Gly Met Arg Tyr Val Pro Tyr Asn Gly Pro Ser Val Val 
225 230 235 240 

Pro Ala Trp Leu Leu Arg Glu Pro Glu Arg Arg Arg Val Cys Leu Thr 

245 250 255 

Leu Gly Gly Ser Ser Arg Glu His Gly lie Gly Gin Val Ser lie Gly 

260 265 270 

Glu Met Leu Asp Ala He Ala Asp He Asp Ala Glu Phe Val Ala Thr 

275 280 285 

Phe Asp Asp Gin Gin Leu Val Gly Val Gly Ser Val Pro Ala Asn Val 

290 295 300 

Arg Thr Ala Gly Phe Val Pro Met Asn Val Leu Leu Pro Thr Cys Ala 
305 310 315 320 

Ala Thr- Val His His Gly Gly Thr Gly Ser Trp Leu Thr Ala Ala He 

325 330 335 
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His 


Gly 


Val 


Pro 


Gin 


lie 


lie Leu 


Ser 


Asp 


Ala 


Asp 


Thr 


Glu 


Val 


His 








340 








345 










350 






Ala 


Lys 


Gin 


Leu 


Gin 


Asp 


Leu Gly 


Ala 


Gly 


Leu 


Ser 


Leu 


Pro 


Val 


Ala 






355 








360 










365 








Gly 


Met 


Thr 


Ala 


Glu 


His 


Leu Arg 


Gly Ala 


He 


Glu 


Arg 


Val 


Leu 


Asp 




370 










375 








380 








Glu 


Pro 


Ala 


Tyr 


Arg 


Leu 


Gly Ala 


Glu 


Arg 


Met 


Arg 


Asp 


Gly 


Met 


Arg 


385 










390 








395 










400 


Thr 


Asp 


Pro 


Ser 


Pro 


Ala 


Gin Val 


Val 


Gly 


He 


Cys 


Gin 


Asp 


Leu 


Ala 










405 








410 










415 




Ala 


Asp 


Arg 


Ala 


Ala 


Arg 


Gly Arg 


Gin 


Pro 


Arg 


Arg 


Thr 


Ala 


Glu 


Pro 








420 








425 










430 






His 


Leu 


Pro 


Arg 

























435 



<210> 5 
<211> 390 
<212> PRT 

<213> Micromonospora megalomicea 



<400> 5 



Met 


Val 


Thr 


Ser 


Thr 


Asn 


Leu 


Asp 


Thr 


Thr 


Ala 


Arg 


Pro 


Ala 


Leu 


Asn 


1 








5 










10 










15 




Ser 


Leu 


Thr 


Gly 


Met 


Arg 


Phe 


Val 


Ala 


Ala 


Phe 


Leu 


Val 


Phe 


Phe 


Thr 








20 










25 










30 






His 


Val 


Leu 


Ser 


Arg 


Leu 


He 


Pro 


Asn 


Ser 


Tyr 


Val 


Tyr 


Ala 


Asp 


Gly 






35 










40 










45 








Leu 


Asp 


Ala 


Phe 


Trp 


Gin 


Thr 


Thr 


Gly Arg 


Val 


Gly Val 


Ser 


Phe 


Phe 




50 










55 










60 










Phe 


He 


Leu 


Ser 


Gly 


Phe 


Val 


Leu 


Thr 


Trp 


Ser 


Ala 


Arg 


Ala 


Ser 


Asp 


65 










70 










75 










80 


Ser 


Val 


Trp' 


Ser 


Phe 


Trp 


Arg 


Arg 


Arg 


Val 


Cys 


Lys 


Leu 


Phe 


Pro 


Asn 










85 










90 










95 




His 


Leu 


Val 


Thr 


Ala 


Phe 


Ala 


Ala 


Val 


Val 


Leu 


Phe 


Leu 


Val 


Thr 


Gly 








100 










105 










110 




Gin 


Ala 


Val 


Ser 


Gly 


Glu 


Ala 


Leu 


He 


Pro 


Asn 


Leu 


Leu 


Leu 


He 


His 






115 










120 










125 








Ala 


Trp 


Phe 


Pro 


Ala 


Leu 


Glu 


He 


Ser 


Phe 


Gly 


He 


Asn 


Pro 


Val 


Ser 




130 










135 










140 










Trp 


Ser 


Leu 


Ala 


Cys 


Glu 


Ala 


Phe 


Phe 


Tyr 


Leu 


Cys 


Phe 


Pro 


Leu 


Phe 


145 










150 










155 










160 


Leu 


Phe 


Trp 


He 


Ser 


Gly 


He 


Arg 


Pro 


Glu Arg 


Leu 


Trp 


Ala 


Trp 


Ala 










165 










170 










175 




Ala 


Val 


Val 


Phe 


Ala 


Ala 


He 


Trp 


Ala 


Val 


Pro 


Vai 


Val 


Ala 


Asp 


Leu 








180 










185 










190 






Leu 


Leu 


Pro 


Ser 


Ser 


Pro 


Pro 


Leu 


He 


Pro 


Gly 


Leu 


Glu 


Tyr 


Ser 


Ala 






195 










200 










205 








He 


Gin 


Asp 


Trp 


Phe 


Leu 


Tyr 


Thr 


Phe 


Pro 


Ala 


Thr 


Arg 


Ser 


Leu 


Glu 




210 










215 










220 










Phe 


He 


Leu 


Gly 


He 


He 


Leu 


Ala 


Arg 


He 


Leu 


He 


Thr 


Gly 


Arg 


Trp 


225 










230 










235 










240 


He 


" Asn 


Val 


Gly Leu 


Leu 


Pro 


Ala 


Val 


Leu 


Leu 


Phe 


Pro 


Val 


Phe 


Phe 










245 










250 










255 




Val 


Ala 


Ser 


Leu 


Phe 


Leu 


Pro 


Gly 


Val 


Tyr 


Ala 


He 


Ser 


Ser 


Ser 


Met 








260 










265 










270 






Met 


He 


Leu 


Pro 


Leu 


Val 


Leu 


He 


He 


Ala 


Ser 


Gly 


Ala 


Thr 


Ala 


Asp 






275 










280 










285 






Leu 


Gin 


Gin 


Lys 


Arg 


Thr 


Phe 


Met 


Arg Asn 


Arg 


Val 


Met 


Val 


Trp 


Leu 




290 










295 










300 










Gly Asp- Val 


Ser 


Phe 


Ala 


Leu 


Tyr 


Met 


Val 


His 


Phe 


Leu 


Val 


He 


Val 


305 










310 










315 










320 
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Tyr Gly Ala Asp Leu Leu Gly Phe Ser Gin Thr Glu Asp Ala Pro Leu 

325 . 330 ' 335 

Gly Leu Ala Leu Phe Met He He Pro Phe Leu Ala* Val Ser Leu Val 

340 345 350 

Leu Ser Trp Leu Leu Tyr Arg Phe Val Glu Leu Pro Val Met Arg Asn 

355 360 365 

Trp Ala Arg Pro Ala Ser Ala Arg Arg Lys Pro Ala Thr Glu Pro Glu 

370 375 380 

Gin Thr Pro Ser Arg Arg 
385 390 

<210> 6 
<211> 374 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 6 

Met Thr Thr Tyr Val Trp Ser Tyr Leu Leu Glu Tyr Glu Arg Glu Arg 

1 5 10 15 

Ala Asp He Leu Asp Ala Val Gin Lys Val Phe Ala Ser Gly Ser Leu 

20 25 30. 

He Leu Gly Gin Ser Val Glu Asn Phe Glu Thr Glu Tyr Ala Arg Tyr 

35 40 45 

His Gly He Ala His Cys Val Gly Val Asp Asn Gly Thr Asn Ala Val 

50 55 60 

Lys Leu Ala Leu Glu Ser Val Gly Val Gly Arg Asp Asp Glu Val Val 
65 70 75 80 

Thr Val Ser Asn Thr Ala Ala Pro Thr Val Leu Ala He Asp Glu He 

85 90 95 

Gly Ala Arg Pro Val Phe Val Asp Val Arg Asp Glu Asp Tyr Leu Met 

100 105 110 

Asp Thr Asp' Leu Val Glu Ala Ala Val Thr Pro Arg Thr Lys Ala He 

115 120 125 

Val Pro Val His Leu Tyr Gly Gin Cys Val Asp Met Thr Ala Leu Arg 

130 135 140 

Glu Leu Ala Asp Arg Arg Gly Leu Lys Leu Val Glu Asp Cys Ala Gin 
145 150 155 160 

Ala His Gly Ala Arg Arg Asp Gly Arg Leu Ala Gly Thr Met Ser Asp 

165 170 175 

Ala Ala Ala Phe Ser Phe Tyr Pro Thr Lys Val Leu Gly Ala Tyr Gly 

180 185 190 

Asp Gly Gly Ala Val Val Thr Asn Asp Asp Glu Thr Ala Arg Ala Leu 

195 200 205 

Arg Arg Leu Arg Tyr Tyr Gly Met Glu Glu Val Tyr Tyr Val Thr Arg 

210 215 220 

Thr Pro Gly His Asn Ser Arg Leu Asp Glu Val Gin Ala Glu He Leu 
225 230 235 240 

Arg Arg Lys Leu Thr Arg Leu Asp Ala Tyr Val Ala Gly Arg Arg Ala 

245 250 255 

Val Ala Gin Arg Tyr Val Asp Gly Leu Ala Asp Leu Gin Asp Ser His 

260 265 270 

Gly Leu Glu Leu Pro Val Val Thr Asp Gly Asn Glu His Val Phe Tyr 

275 280 285 

Val Tyr Val Val Arg His Pro Arg Arg Asp Glu He He Lys Arg Leu 

290 295 300 

Arg Asp Gly Tyr Asp He Ser Leu Asn He Ser Tyr Pro Trp Pro Val 
305 310 315 320 

His Thr Met Thr Gly Phe Ala His Leu Gly Val Ala Ser Gly Ser Leu 

325 330 335 

Pro Val- Thr Glu Arg Leu Ala Gly Glu He Phe Ser Leu Pro Met Tyr 

340 345 350 
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Pro Ser Leu Pro His Asp Leu Gin Asp Arg Val lie Glu Ala Val Arg 

355 360 365 ' 

Glu Val He Thr Gly Leu 
370 

<210> 7 
<211> 257 
<212> PRT 
' <213> Micromonospora megalomicea 



<400> 7 



Met 


Pro 


Asn 


Ser 


His 


Ser 


Thr 


Thr 


Ser 


Ser 


Thr 


Asp 


Val 


Ala 


Pro 


Tyr 


1 








5 










10 










15 




Glu 


Arg 


Ala 


Asp 


He 


Tyr 


His 


Asp 


Phe 


Tyr 


His 


Gly Arg 


Gly 


Lys 


Gly 








20 










25 










30 






Tyr 


Arg 


Ala 
35 


Glu 


Ala 


Asp 


Ala 


Leu 
40 


Val 


Glu 


Val 


Ala 


Arg 
45 


Lys 


His 


Thr 


Pro 


Gin 
50 


Ala 


Ala 


Thr 


Leu 


Leu 
55 


Asp 


Val 


Ala 


Cys 


Gly 
60 


Thr 


Gly 


Ser 


His 


Leu 


Val 


Glu 


Leu 


Ala 


Asp 


Ser 


Phe 


Arg 


Glu 


Val 


Val 


Gly 


Val 


Asp 


Leu 


65 










70 










75 










80 


Ser 


Ala 


Ala 


Met 


Leu 


Ala 


Thr 


Ala 


Ala 


Arg 


Asn 


Asp 


Pro Gly Arg 


Glu 










85 










90 










95 




Leu 


His 


Gin 


Gly 
100 


Asp 


Met 


Arg 


Asp 


Phe 
105 


Ser 


Leu 


Asp 


Arg 


Arg 
110 


Phe 


Asp 


Val 


Val 


Thr 
115 


Cys 


Met 


Phe 


Ser 


Ser 
120 


Thr 


Gly 


Tyr 


Leu 


Val 
125 


Asp 


Glu 


Ala 


Glu 


Leu 
130 


Asp 


Arg 


Ala 


Val 


Ala 
135 


Asn 


Leu 


Ala 


Gly 


His 
140 


Leu 


Ala 


Pro 


Gly 


Gly 


Thr 


Leu 


Val 


Val 


Glu 


Pro 


Trp 


Trp 


Phe 


Pro 


Glu 


Thr 


Phe 


Arg 


Pro 


145 










150 










155 










160 


Gly 


Trp 


Val 


Gly Ala 


Asp 


Leu 


Val 


Thr 


Ser 


Gly 


Asp 


Arg 


Arg 


He 


Ser 










165 










170 










175 




Arg 


Met 


Ser 


His 
180 


Thr 


Val 


Pro 


Ala 


Gly 
185 


Leu 


Pro 


Asp 


Arg 


Thr 
190 


Ala 


Ser 


Arg 


Met 


Thr 
195 


He 


His 


Tyr 


Thr 


Val 
200 


Gly 


Ser 


Pro 


Glu 


Ala 
205 


Gly 


He 


Glu 


His 


Phe 
210 


Thr 


Glu 


Val 


His 


Val 
215 


Met 


Thr 


Leu 


Phe 


Ala 
220 


Arg 


Ala 


Ala 


Tyr 


Glu 


Gin 


Ala 


Phe 


Gin 


Arg 


Ala 


Gly 


Leu 


Ser 


Cys 


Ser 


Tyr 


Val 


Gly 


His 


225 










230 










235 










240 


Asp 


Leu 


Phe 


Ser 


Pro 

24.5 


Gly 


Leu 


Phe 


Val 


Gly 
250 


Val 


Ala 


Ala 


Glu 


Pro 
255 


Gly 



Arg 



<210> 8 
<211> 201 
<212> PRT 

<213> Micromonospora megalomicea 



<400> 8 

Met Arg Val Glu Glu Leu Gly He 

1 5 
Gin Thr Phe Ala Asp Glu Arg Gly 

20 

Asp Val Phe Val Ala Ala Leu Gly 

35 40 
Val Ser Thr Thr Arg Ser Arg Arg 

50 - 55 
Thr Thr Met Pro Gly Ser Met Ala 



Glu Gly Val Phe Thr Phe Thr Pro 

10 15 
Val Phe Gly Thr Ala Tyr Gin Glu 
25 30 
Arg Pro Leu Phe Pro Val Ala Gin 

45 

Gly Val Val Arg Gly Val His Phe 

60 

Lys Tyr Val Tyr Cys Ala Arg Gly 



22 



5 DOC ID: <WO 



01272adA3 1A> 



WO 01/27284 



PCT/USOO/27433 



65 










70 










/ D 










an 

o u 


Arg 


Ala 


Met 


Asp 


Phe 


Ala 


XT s T 


TV 

/\sp 


lie 


Arg 


v> 

fro 


uiy 


O y- 




i. ni 


true 








O D 










QA 














Gly 


Arg 


Ala 


Glu 


Pro 


Val 


Glu 


Leu 




"TV T _ 

Ala 


Glu 


ber 


Me u 


Vdl 


oiy 


ijeu 




100 










1UD 










Tin 






Tyr 


Leu 


Pro 


Val 


Gly 


Met 


Gly 


HIS 


Leu 


f ne 


vai 


oer 


Lcli 




nop 






115 










120 










125 








Thr 


Thr 
130 


Leu 


Val 


Tyr 


Leu 


Met 
135 


Ser 


Ala 


Gly 


Tyr 


Val 
140 


Pro 


Asp 


Lys 


GlU 


Arg 


Ala 


Val 


His 


Pro 


Leu 


Asp 


Pro 


Glu 


Leu 


Ala 


Leu 


Pro 


He 


Pro 


Ala 


145 










150 










155 










160 


Asp 


Leu 


Asp 


Leu 


Val 


Met 


Ser 


Glu 


Arg 


Asp 


Arg 


Val 


Ala 


Pro 


Thr 


Leu 








165 










170 










175 




Arg 


Glu 


Ala 


Arg 


Asp 


Gin 


Gly 


lie 


Leu 


Pro 


Asp 


Tyr 


Ala 


Ala 


Cys 


Arg 






180 










185 










190 






Ala 


Ala 


Ala 
195 


His 


Arg 


Val 


Val 


Arg 
200 


Thr 

















<210> 9 
<211> 328 
<212> PRT 

<213> Micromonospora megalornicea 



<400> 9 

Met Val Val Leu Gly 

1 5 
Ala Leu Ala Asp Leu 

20 

Val Val Val Pro Ser 

35 

Asp Leu Thr Glu Pro 
50 

Ala Val Phe Pro Phe 
65 

lie Ser Glu Asp Asp 

85 

Arg Asp Leu He Ala 

100 

Val Phe Pro Gly Ser 
115 

Val He Asp Gly Ser 
130 

Gin Lys His Thr Gly 
145 

Ala lie Arg Ala Thr 

165 

Ala Ala Gly Thr Ala 

180 

Arg Ala Leu Thr Gly 
195 

Arg Arg Glu Leu Leu 
210 

Ala Leu Asp His Ala 
225 

Thr Gly Arg Ser Trp 

245 

Ser Val Ala Arg His 

260 

Pro Pro Pro Ala His 
- 275 

Asp Pro Ala Arg Phe 



Ala Ser Gly Phe Leu Gly 

10 

Pro Val Arg Val Arg Leu 

25 

Gly Ala Val Ala Asp Tyr 
40 

Gly Ala Leu Ala Glu Val 
55 

Ala Ala Gin He Arg Gly 
70 75 
Val Val Ala Glu Arg Thr 

90 

Val Leu Ser Arg Ser Pro 

105 

Asn Thr Gin Val Gly Arg 
120 

Glu Gin Asp His Pro Glu 
135 

Glu Gin Leu Leu Lys Glu 
150 155 
Ser Leu Arg Leu Pro Pro 

170 

Asp Asp Arg Gly Val Val 

185 

Gin Pro Leu Thr Met Trp 
200 

Tyr Val Thr Asp Ala Ala 
215 

Asp Ala Leu Ala Gly Arg 
230 235 
Pro Leu Gly Glu Val Phe 

250 

Thr Gly Glu Asp Pro Val 

265 

Met Asp Pro Ser Asp Leu 
280 

Thr Ala Val Thr Gly Trp 



Ser Ala Val Thr His 

15 

Val Ala Arg Arg Glu 
30 

Glu Thr His Arg Val 
45 

Val Ala Asp Ala Arg 
60 

Thr Ser Gly Trp Arg 

80 

Asn Val Gly Leu Val 

95 

His Ala Pro Val Val 
110 

Val Thr Ala Gly Arg 
125 

Gly Val Tyr Asp Arg 
140 

Ala Thr Ala Ala Gly 

160 

Val Phe Gly Val Pro 

175 

Ser Thr Met He Arg 
190 

His Asp Gly Thr Val 
205 

Arg Ala Phe Val Thr 
220 

His Phe Leu Leu Gly 

240 

Gin Ala Val Ser Arg 

- 255 

Pro Val Val Ser Val 
270 

Arg Ser Val Glu Val 
285 

Arg Ala Thr Val Thr 
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290 295 300 

Met Ala Glu Ala Val Asp Arg Thr Val Ala Ala Leu Ala Pro Arg Arg 
305 310 315 320 

Ala Ala Ala Pro Ser Glu Pro Ser 

325 



<210> 10 
<211> 330 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 10 

Met Gly Thr Thr Gly Ala Gly Ser Ala Arg Val Arg Val Gly Arg Ser 

1 5 10 15 

Ala Leu His Thr Ser Arg Leu Trp Leu Gly Thr Val Asn Phe Ser Gly 

20 25 30 

Arg Val Thr Asp Asp Asp Ala Leu Arg Leu Met Asp His Ala Leu Glu 

35 40 45 

Arg Gly Val Asn Cys lie Asp Thr Ala Asp lie Tyr Gly Trp Ax-g Leu 

50 55 60 

Tyr Lys Gly His Thr Glu Glu Leu Val Gly Arg Trp Phe Ala Gin Gly 
65 70 75 80 

Gly Gly Arg Arg Glu Glu Thr Val Leu Ala Thr Lys Val Gly Ser Glu 

85 90 95 

Met Ser Glu Arg Val Asn Asp Gly Gly Leu Ser Ala Arg His lie Val 

100 105 110 

Ala Ala Cys Glu Asn Ser Leu Arg Arg Leu Gly Val Asp His lie Asp 

115 120 125 

lie Tyr Gin Thr His His lie Asp Arg Ala Ala Pro Trp Asp Glu Val 

130 135 140 

Trp Gin Ala Ala Glu His Leu Val Gly Ser Gly Lys Val Gly Tyr Val 
145 150 155 160 

Gly Ser Ser Asn Leu Ala Gly Trp His lie Ala Ala Ala Gin Glu Ser 

165 170 175 

Ala Ala Arg Arg Asn Leu Leu Gly Met lie Ser His Gin Cys Leu Tyr 

180 185 190 

Asn Leu Ala Val Arg His Pro Glu Leu Asp Val Leu Pro Ala Ala Gin 

195 200 205 

Ala Tyr Gly Val Gly Val Phe Ala Trp Ser Pro Leu His Gly Gly Leu 

210 215 220 

Leu Ser Gly Val Leu Glu Lys Leu Ala Ala Gly Thr Ala Val Lys Ser 
225 230 235 240 

Ala Gin Gly Arg Ala Gin Val Leu Leu Pro Ala Val Arg Pro Leu Val 

245 250 255 

Glu Ala Tyr Glu Asp Tyr Cys Arg Arg Leu Gly Ala Asp Pro Ala Glu 

260 265 270 

Val Gly Leu Ala Trp Val Leu Ser Arg Pro Gly lie Leu Gly Ala Val 

275 280 285 . 

lie Gly Pro Arg Thr Pro Glu Gin Leu Asp Ser Ala Leu Arg Ala Ala 

290 295 300 

Glu Leu Thr Leu Gly Glu Glu Glu Leu Arg Glu Leu Glu Ala lie Phe 
305 310 315 320 

Pro Ala Pro Ala Val Asp Gly Pro Val Pro 

325 330 



<210> 11 
<211> 417 

<212> PRT ( 
<213> Micromonospora megalomicea 

<400> 11 
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Met Arg Val Leu 
1 

Leu Val Pro Leu 

20 

Val Ala Ser Gin 
35 

Thr Ser Val Pro 
50 

Glu Ala Ala Ala 
65 

Arg Arg Gly Pro 

Glu Ala Thr Ser 

100 

Val Asp Glu Leu 
115 

Leu Trp Glu Pro 
130 

Gly Ala Ala His 
145 

Phe Arg Ser Arg 

Arg Pro Asp Pro 

180 

Gly Leu Asp Tyr 
195 

Gin Leu Pro Glu 
210 

Thr Arg Thr Leu 
225 

Arg Thr Ser Asp 

Ala Leu Gly lie 

2 60 

Thr Leu Ala Arg 

275 

Asp Pro Ala Ser 
290 

Met Asn lie Leu 
305 

Ala Gly Ser Trp 

Val Ala His Glu 

340 

Leu Gly Ala Gly 
355 

Leu Trp Gin Ala 
370 

Asn Ala Glu Lys 
385 

Glu Val Val Pro 
Arg 



Leu Thr Ser Phe 
5 

Ala Trp Ala Leu 

Pro Glu Leu Thr 

40 

Leu Gly Ser Asp 
55 

Gin Val His Arg 
70 

Glu Leu Arg Ser 
85 

Arg Phe Val Phe 

Val Glu Phe Ala 

120 

Phe Thr Phe Ala 
135 

Ala Arg Leu Leu 
150 

Ser Gin Asp Leu 
165 

Leu Gly Gly Trp 

Ser Glu Asp Leu 

200 

Ser Phe Arg Leu 
215 

Pro Tyr Asn Gly 
230 

Gly Val Arg Arg 
245 

Thr Ser Asn Pro 

Phe Asp Gly Glu 

280 

Val Pro Asp Asn 
295 

Leu Pro Gly Cys 
310 

Ala Thr Ala Leu 
325 

Trp Asp Cys Val 

Val Phe Leu Arg 

360 

Leu Ala Thr Val 
375 

Leu Arg Gin Glu 
390 

Val Leu Glu Ala 
405 



Ala His Arg Thr 
10 

His Thr Ala Gly 
25 

Asp Val Val Val 

His Arg Leu Phe 

60 

Tyr Thr Thr Asp 

75. 

Trp Glu Phe Leu 
90 

Pro Val Val Asn 
105 

Met Asp Trp Arg 

Gly Ala Val Ala 

140 

Trp Gly Ser Asp 
155 

Arg Gly Gin Arg 
170 

Leu Thr Glu Val 
185 

Ala Val Gly Gin 

Glu Thr Gly Leu 

220 

Ser Ser Val Val 
235 

Val Cys Phe Thr 
250 

Gin Glu Phe Leu 
265 

lie Val Val Thr 

Val Arg Leu Val 

300 

Ala Ala Val lie 
315 

His His Gly Val 
330 

Leu Arg Gly Gin 
345 

Pro Asp Glu Val 

Val Glu Asp Arg 

380 

Ala Leu Ala Ala 
395 

Leu Ala His Gin 
410 



His Phe Gin Gly 
15 

His Asp Val Arg 
30 

Gly Ala Gly Leu 
45 

Asp lie Ser Pro 

Leu Asp Phe Ala 

80 

His Gly lie Glu 
95 

Asn Asp Ser Phe 
110 

Pro Asp Leu Val 
125 

Ala Lys Ala Cys 

Leu Thr Gly Tyr 

160 

Pro Ala Asp Asp 
175 

Ala Gly Arg Phe 
190 

Trp Ser Val Asp 
205 

Glu Ser Val His 

Pro Gin Trp Leu 

240 

Gly Gly Tyr Ser 
255 

Arg Thr Leu Ala 
270 

Arg Ser Gly Leu 
285 

Asp Phe Val Pro 

His His Gly Gly 

320 

Pro Gin lie Ser 
335 

Arg Thr Ala Glu 
350 

Asp Ala Asp Thr 
365 

Ser His Ala Glu 

Pro Thr Pro Ala 

400 

His Arg Ala Asp 
415 



<210> 12 
<211> 313 
<212> PRT 

<213> Micromonospora megaloraicea 
<400> 12 
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Met Thr Arg His Val Thr Leu Leu Gly Val Ser Gly Phe Val Gly Ser 

15 10 15 

Ala Leu Leu Arg Glu Phe Thr Thr His Pro Leu Arg Leu Arg Ala Val 

20 25 30 

Ala Arg Thr Gly Ser Arg Asp Gin Pro Pro Gly Ser Ala Gly lie Glu 

35 40 45 

His Leu Arg Val Asp Leu Leu Glu Pro Gly Arg Val Ala Gin Val Val 

50 55 60 

Ala Asp Thr Asp Val Val Val His Leu Val Ala Tyr Ala Ala Gly Gly 
65 70 75 80 

Ser Thr Trp Arg Ser Ala Ala Thr Val Pro Glu Ala Glu Arg Val Asn 

85 90 95 

Ala Gly He Met Arg Asp Leu Val Ala Ala Leu Arg Ala Arg Pro Gly 

100 105 110 

Pro Ala Pro Val Leu Leu Phe Ala Ser Thr Thr Gin Ala Ala Asn Pro 

115 120 125 

Ala Ala Pro Ser Arg Tyr Ala Gin His Lys He Glu Ala Glu Arg He 

130 135 140 

Leu Arg Gin Ala Thr Glu Asp Gly Val Val Asp Gly Val He Leu Arg 
145 150 155 160 

Leu Pro Ala He Tyr Gly His Ser Gly Pro Ser Gly Gin Thr Gly Arg 

165 170 175 

Gly Val Val Thr Ala Met He Arg Arg Ala Leu Ala Gly Glu Pro He 

180 185 190 

Thr Met Trp His Glu Gly Ser Val Arg Arg Asn Leu Leu His Val Glu 

195 200 205 

Asp Val Ala Thr Ala Phe Thr Ala Ala Leu His Asn His Glu Ala Leu 

210 215 220 

Val Gly Asp Val Trp Thr Pro Ser Ala Asp Glu Ala Arg Pro Leu Gly 
225 230 235 240 

Glu He Phe Glu Thr Val Ala Ala Ser Val Ala Arg Gin Thr Gly Asn 

245 250 255 

Pro Ala Val Pro Val Val Ser Val Pro Pro Pro Glu Asn Ala Glu Ala 

260 265 , 270 

Asn Asp Phe Arg Ser Asp Asp Phe Asp Ser Thr Glu Phe Arg Thr Leu 

275 280 285 

Thr Gly Trp His Pro Arg Val Pro Leu Ala Glu Gly He Asp Arg Thr 

290 295 300 

Val Ala Ala Leu He Ser Thr Lys Glu 
305 310 



<210> 13 

<211> 3546 

<212> PRT 

<213> Micromonospora megalomicea 



<400> 13 
Met Val Asp Val 
1 

Pro Leu Pro Phe 

20 

Arg Ala Arg Ala 
35 

Asp Asp Val Val 
50 

Gin Asp Gly Pro 
65 

Leu Thr Ala Ala 

Val Val- Arg Gly 

100 



Pro Asp Leu Leu 
5 

Pro Trp Pro Leu 

Arg Gin Leu His 

40 

Ala Val Gly Ala 
55 

His Arg Ala Val 
70 

Leu Ala Ala Leu 
85 

Val Ala Arg Pro 



Gly Thr Arg Thr 
10 

Cys Gly His Asn 
25 

Ala Tyr Leu Glu 

Ala Leu Ala Arg 

60 

Val Val Ala Ser 
75 

Ala Gin Gly Arg 
90 

Thr Ala Pro Val 
105 



Pro His Pro Gly 
15 

Glu Pro Glu Leu 
30 

Gly He Ser Glu 

.45 

Glu Thr Arg Ala 

Ser Val Thr Glu 

80 

Pro His Pro Ser 
95 

Val Phe Val Leu 
110 
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Pro Gly Gin Gly Ala Gin Trp Pro Gly Met Ala Thr Arg Leu Leu Ala 

115 120 125 

Glu Ser Pro Val Phe Ala Ala Ala Met Arg Ala Cys Glu Arg Ala Phe 

130 135 140 

Asp Glu Val Thr Asp Trp Ser Leii Thr Glu Val Leu Asp Ser Pro Glu 
145 150 155 160 

His Leu Arg Arg Val Glu Val Val Gin Pro Ala Leu Phe Ala Val Gin 

165 170 175 

Thr Ser Leu Ala Ala Leu Trp Arg Ser Phe Gly Val Arg Pro Asp Ala 

180 185 190 

Val Leu Gly His Ser lie Gly Glu Leu Ala Ala Ala Glu Val Cys Gly 

195 200 . 205 

Ala Val Asp Val Glu Ala Ala Ala Arg Ala Ala Ala Leu Trp Ser Arg 

210 215 220 

Glu Met Val Pro Leu Val Gly Arg Gly Asp Met Ala Ala Val Ala Leu 
225 230 235 ' 240 

Ser Pro Ala Glu Leu Ala Ala Arg Val Glu Arg Trp Asp Asp Asp Val 

245 250 255 

Val Pro Ala Gly Val Asn Gly Pro Arg Ser Val Leu Leu Thr Gly Ala 

260 265 270 

Pro Glu Pro lie Ala Arg Arg Val Ala Glu Leu Ala Ala Gin Gly Val 

275 280 285 

Arg Ala Gin Val Val Asn Val Ser Met Ala Ala His Ser Ala Gin Val 

290 295 300 

Asp Ala Val Ala Glu Gly Met Arg Ser Ala Leu Thr Trp Phe Ala Pro 
305 310 315 320 

Gly Asp Ser Asp Val Pro Tyr Tyr Ala Gly Leu Thr Gly Gly Arg Leu 

325 330 335 

Asp Thr Arg' Glu Leu Gly Ala Asp His Trp Pro Arg Ser Phe Arg Leu 

340 345 350 

Pro Val Arg Phe Asp Glu Ala Thr Arg Ala Val Leu Glu Leu Gin Pro 

355 360 365 

Gly Thr Phe lie Glu Ser Ser Pro His Pro Val Leu Ala Ala Ser Leu 

370 375 380 

Gin Gin Thr Leu Asp Glu Val Gly Ser Pro Ala Ala lie Val Pro Thr 
385 390 395 400 

Leu Gin Arg Asp Gin Gly Gly Leu Arg Arg Phe Leu Leu Ala Val Ala 

405 410 415 

Gin Ala Tyr Thr Gly Gly Val Thr Val Asp Trp Thr Ala Ala Tyr Pro 

420 425 430 

Gly Val Thr Pro Gly His Leu Pro Ser Ala Val Ala Val Glu Thr Asp 

435 440 445 

Glu Gly Pro Ser Thr Glu Phe Asp Trp Ala Ala Pro Asp His Val Leu 

450 455 460 

Arg Ala Arg Leu Leu Glu lie Val Gly Ala Glu Thr Ala Ala Leu Ala 
465 470 475 480 

Gly Arg Glu Val Asp Ala Arg Ala Thr Phe Arg Glu Leu Gly Leu Asp 

485 490 495 

Ser Val Leu Ala Val Gin Leu Arg Thr Arg Leu Ala Thr Ala Thr Gly 

500 505 510 

Arg Asp Leu His lie Ala Met Leu Tyr Asp His Pro Thr Pro His Ala 

515 520 525 

Leu Thr Glu Ala Leu Leu Arg Gly Pro Gin Glu Glu Pro Gly Arg Gly 

530 535 540 

Glu Glu Thr Ala His Pro Thr Glu Ala Glu Pro Asp Glu Pro Val Ala 
545 550 555 560 

Val Val Ala Met Ala Cys Arg Leu Pro Gly Gly Val Thr Ser Pro Glu 

565 570 575 

Glu Phe Trp Glu Leu Leu Ala Glu Gly Arg Asp Ala Val Gly Gly Leu 

580 585 590 

Pro Thr Asp Arg Gly Trp Asp Leu Asp Ser Leu Phe His Pro Asp Pro 
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595 600 605 

Thr Arg Ser Gly Thr Ala His Gin Arg Ala Gly Gly Phe Leu Thr Gly 

610 615 620 

Ala Thr Ser Phe Asp Ala Ala Phe Phe Gly Leu Ser Pro Arg Glu Ala 
625 630 635 640 

Leu Ala Val Glu Pro Gin Gin Arg lie Thr Leu Glu Leu Ser Trp Glu 

645 650 655 

Val Leu Glu Arg Ala Gly lie Pro Pro Thr Ser Leu Arg Thr Ser Arg 

660 665 670 

Thr Gly Val Phe Val Gly Leu lie Pro Gin Glu Tyr Gly Pro Arg Leu 

675 680 685 

Ala Glu Gly Gly Glu Gly Val Glu Gly Tyr Leu Met Thr Gly Thr Thr 

690 695 700 

Thr Ser Val Ala Ser Gly Arg Val Ala Tyr Thr Leu Gly Leu Glu Gly 
705 710 715 720 

Pro Ala He Ser Val Asp Thr Ala Cys Ser Ser Ser Leu Val Ala Val 

725 730 735 

His Leu Ala Cys Gin Ser Leu Arg Arg Gly Glu Ser Thr Met Ala Leu 

740 745 750 

Ala Gly Gly Val Thr Val Met Pro Thr Pro Gly Met Leu Val Asp Phe 

755 760 765 

Ser Arg Met Asn Ser Leu Ala Pro Asp Gly Arg Ser Lys Ala Phe Ser 

770 775 780 

Ala Ala Ala Asp Gly Phe Gly Met Ala Glu Gly Ala Gly Met Leu Leu 
785 790 795 800 

Leu Glu Arg Leu Ser Asp Ala Arg Arg His Gly His Pro Val Leu Ala 

805 810 815 

Val lie Arg Gly Thr Ala Val Asn Ser Asp Gly Ala Ser Asn Gly Leu 

820 825 830 

Ser Ala Pro Asn Gly Arg Ala Gin Val Arg Val He Arg Gin Ala Leu 

835 840 845 

Ala Glu Ser 'Gly Leu Thr Pro His Thr Val Asp Val Val Glu Thr His 

850 855 860 

Gly Thr Gly Thr Arg Leu Gly Asp Pro He Glu Ala Arg Ala Leu Ser 
865 870 875 880 

Asp Ala Tyr Gly Gly Asp Arg Glu His Pro Leu Arg He Gly Ser Val 

885 890 895 

Lys Ser Asn He Gly His Thr Gin Ala Ala Ala Gly Val Ala Gly Leu 

900 905 910 

He Lys Leu Val Leu Ala Met Gin Ala Gly Val Leu Pro Arg Thr Leu 

915 920 925 

His Ala Asp Glu Pro Ser Pro Glu He Asp Trp Ser Ser Gly Ala He 

930 935 940 

Ser Leu Leu Gin Glu Pro Ala Ala Trp Pro Ala Gly Glu Arg Pro Arg 
945 950 955 960 

Arg Ala Gly Val Ser Ser Phe Gly He Ser Gly Thr Asn Ala His Ala 

965 970 975 

He He Glu Glu Ala Pro Pro Thr Gly Asp Asp Thr Arg Pro Asp Arg 

980 985 990 

Met Gly Pro Val Val Pro Trp Val Leu Ser Ala Ser Thr Gly Glu Ala 

995 1000 1005 

Leu Arg Ala Arg Ala Ala Arg Leu Ala Gly His Leu Arg Glu His Pro 

1010 1015 1020 

Asp Gin Asp Leu Asp Asp Val Ala Tyr Ser Leu Ala Thr GLy Arg Ala 
1025 1030 1035 1040 

Ala Leu Ala Tyr Arg Ser Gly Phe Val Pro Ala Asp Ala Ser Thr Ala 

1045 1050 1055 

Leu Arg He Leu Asp Glu Leu Ala Ala Gly Gly Ser Gly Asp Ala Val 

1060 1065 1070 

Thr Gly -Thr Ala Arg Ala Pro Gin Arg Val Val Phe Val Phe Pro Gly 
1075 1080 1085 
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Gin Gly Trp Gin Trp Ala Gly Met Ala Val Asp Leu Leu Asp Gly Asp 

1090 1095 1100 

Pro Val Phe Ala Ser Val Leu Arg Glu Cys Ala Asp Ala Leu Glu Pro 
1105 1110 1115 1120 

Tyr Leu Asp Phe Glu lie Val Pro Phe Leu Arg Ala Glu Ala Gin Arg 

1125 1130 1135 

Arg Thr Pro Asp His Thr Leu Ser Thr Asp Arg Val Asp Val Val Gin 

1140 1145 1150 

Pro Val Leu Phe Ala Val Met Val Ser Leu Ala Ala Arg Trp Arg- Ala 

1155 1160 1165 • 

Tyr Gly Val Glu Pro Ala Ala Val lie Gly His Ser Gin Gly Glu lie 

1170 1175 1180 

Ala Ala Ala Cys Val Ala Gly Ala Leu Ser Leu Asp Asp Ala Ala Arg 
1185 1190 1195 1200 

Ala Val Ala Leu Arg Ser Arg Val lie Ala Thr Met Pro Gly Asn Gly 

1205 1210 1215 

Ala Met Ala Ser lie Ala Ala Ser Val Asp Glu Val Ala Ala Arg lie 

1220 1225 1230 

Asp Gly Arg Val Glu lie Ala Ala Val Asn Gly Pro Arg Ala Val Val 

1235 1240 1245 

Val Ser Gly Asp Arg Asp Asp Leu Asp Arg Leu Val Ala Ser Cys Thr 

1250 1255 1260 

Val Glu Gly Val Arg Ala Lys Arg Leu Pro Val Asp Tyr Ala Ser His 
1265 1270 1275 1280 

Ser Ser His Val Glu Ala Val Arg Asp Ala Leu His Ala Glu Leu Gly 

1285 1290 1295 

Glu Phe Arg Pro Leu Pro Gly Phe Val Pro Phe Tyr Ser Thr Val Thr 

1300 1305 1310 

Gly Arg Trp Val Glu Pro Ala Glu Leu Asp Ala Gly Tyr Trp Phe Arg 

1315 1320 1325 

Asn Leu Arg His Arg Val Arg Phe Ala Asp Ala Val Arg Ser Leu Ala 

1330 ' 1335 1340 

Asp Gin Gly Tyr Thr Thr Phe Leu Glu Val Ser Ala His Pro Val Leu 
1345 1350 1355 1360 

Thr Thr Ala lie Glu Glu lie Gly Glu Asp Arg Gly Gly Asp Leu Val 

1365 1370 1375 

Ala Val His Ser Leu Arg Arg. Gly Ala Gly Gly Pro Val Asp Phe Gly 

1380 1385 1390 

Ser Ala Leu Ala Arg Ala Phe Val Ala Gly Val Ala Val Asp Trp Glu 

1395 1400 1405 

Ser Ala Tyr Gin Gly Ala Gly Ala Arg Arg Val Pro Leu Pro Thr Tyr 

1410 1415 1420 

Pro Phe Gin Arg Glu Arg Phe Trp Leu Glu Pro Asn Pro Ala Arg Arg 
1425 1430 1435 1440 

Val Ala Asp Ser Asp Asp Val Ser Ser Leu Arg Tyr Arg lie Glu Trp 

1445 1450 1455 

His Pro Thr Asp Pro Gly Glu Pro Gly Arg Leu Asp Gly Thr Trp Leu 

1460 1465 1470 

Leu Ala Thr Tyr Pro Gly Arg Ala Asp Asp Arg Val Glu Ala Ala Arg 

1475 1480 1485 

Gin Ala Leu Glu Ser Ala Gly Ala Arg Val Glu Asp Leu Val Val Glu 

1490 1495 1500 

Pro Arg Thr Gly Arg Val Asp Leu Val Arg Arg Leu Asp Ala Val Gly 
1505 1510 1515 1520 

Pro Val Ala Gly Val Leu Cys Leu Phe Ala Val Ala Glu Pro Ala Ala 

1525 1530 1535 

Glu His Ser Pro Leu Ala Val Thr Ser Leu Ser Asp Thr Leu Asp Leu 

1540 1545 1550 

Thr Gin Ala Val Ala Gly Ser Gly Arg Glu Cys Pro lie Trp Val Val 

. 1555 1560 1565 

Thr Glu Asn Ala Val Ala Val Gly Pro Phe Glu Arg Leu Arg Asp Pro 
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1570 1575 1580 

Ala His Gly Ala Leu Trp Ala Leu Gly Arg Val Val Ala Leu Glu Asn 
1585 1590 1595 1600 

Pro Ala Val Trp Gly Gly Leu Val Asp Val Pro Ser Gly Ser Val Ala 

1605 1610 1615 

Glu Leu Ser Arg His Leu Gly Thr Thr Leu Ser Gly Ala Gly Glu Asp 

1620 1625 1630 

Gin Val Ala Leu Arg Pro Asp Gly Thr Tyr Ala Arg Arg Trp Cys Arg 

1635 1640 1645 * 

Ala Gly Ala Gly Gly Thr Gly Arg Trp Gin Pro Arg Gly Thr Val Leu 

1650 1655 1660 

Val Thr Gly Gly Thr Gly Gly Val Gly Arg His Val Ala Arg Trp Leu 
1665 1670 1675 1680 

Ala Arg Gin Gly Thr Pro Cys Leu Val Leu Ala Ser Arg Arg Gly Pro 

1685 1690 1695 

Asp Ala Asp Gly Val Glu Glu Leu Leu Thr Glu Leu Ala Asp Leu Gly 

1700 1705 1710 

Thr Arg Ala Thr Val Thr Ala Cys Asp Val Thr Asp Arg Glu Gin Leu 

1715 1720 1725 

Arg Ala Leu Leu Ala Thr Val Asp Asp Glu His Pro Leu Ser Ala Val 

1730 1735 1740 

Phe His Val Ala Ala Thr Leu Asp Asp Gly Thr Val Glu Thr Leu Thr 
1745 1750 1755 1760 

Gly Asp Arg lie Glu Arg Ala Asn Arg Ala Lys Val Leu Gly Ala Arg 

1765 1770 1775 

Asn Leu His Glu Leu Thr Arg Asp Ala Asp Leu Asp Ala Phe Val Leu 

1780 1785 1790 

Phe Ser Ser Ser Thr Ala Ala Phe Gly Ala Pro Gly Leu Gly Gly Tyr 

1795 1800 1805 

Val Pro Gly Asn Ala Tyr Leu Asp Gly Leu Ala Gin Gin Arg Arg Ser 

1810 1815 1820 

Glu Gly Leu* Pro Ala Thr Ser Val Ala Trp Gly Thr Trp Ala Gly Ser 
1825 1830 1835 1840 

Gly Met Ala Glu Gly Pro Val Ala Asp Arg Phe Arg Arg His Gly Val 

1845 1850 1855 

Met Glu Met His Pro Asp Gin Ala Val Glu Gly Leu Arg Val Ala Leu 

1860 1865 1870 

Val Gin Gly Glu Val Ala Pro lie Val Val Asp lie Arg Trp Asp Arg 

1875 1880 1885 

Phe Leu Leu Ala Tyr Thr Ala Gin Arg Pro Thr Arg Leu Phe Asp Thr 

1890 1895 1900 

Leu Asp Glu Ala Arg Arg Ala Ala Pro Gly Pro Asp Ala Gly Pro Gly 
1 905 1910 1915 1920 

Val Ala Ala Leu Ala Gly Leu Pro Val Gly Glu Arg Glu Lys Ala Val ■ 

1925 1930 1935 

Leu Asp Leu Val Arg Thr His Ala Ala Ala Val Leu Gly His Ala Ser 

1940 1945 1950 

Ala Glu Gin Val Pro Val Asp Arg Ala Phe Ala Glu Leu Gly Val Asp. 

1955 1960 1965 

Ser Leu Ser Ala Leu Glu Leu Arg Asn Arg Leu Thr Thr Ala Thr Gly 

1970 1975 1980. 

Val Arg Leu Ala Thr Thr Thr Val Phe Asp His Pro Asp Val Arg Thr 
1985 1990 1995 2000 

Leu Ala Gly His Leu Ala Ala Glu Leu Gly Gly Gly Ser Gly Arg Glu 

2005 2010 2015 

Arg Pro Gly Gly Glu Ala Pro Thr Val Ala Pro Thr Asp Glu Pro lie 

2020 2025 2030 

Ala lie Val Gly Met Ala Cys Arg Leu Pro Gly Gly Val Asp Ser Pro 

2035 2040 2045 

Glu Glri-Leu Trp Glu Leu lie Val Ser Gly Arg Asp Thr Ala Ser Ala 
2050 2055 2060 
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Ala Pro Gly Asp Arg Ser Trp Asp Pro Ala Glu Leu Met Val Ser Asp 
2065 2070 2075 2080 

Thr Thr Gly Thr Arg Thr Ala Phe Gly Asn Phe Met Pro Gly Ala Gly 

2085 2090 2095 

Glu Phe Asp Ala Ala Phe Phe Gly lie Ser Pro Arg Glu Ala Leu Ala 

2100 2105 - 2110 

Met Asp Pro Gin Gin Arg His Ala Leu Glu Thr Thr Trp Glu Ala Leu 

2115 2120 2125 

Glu Asn Ala Gly He Arg Pro Glu Ser Leu Arg Gly Thr Asp Thr Gly 

2130 2135 2140 

Val Phe Val Gly Met Ser His Gin Gly Tyr Ala Thr Gly Arg Pro Lys 
2145 2150 2155 2160 

Pro Glu Asp Glu Val Asp Gly Tyr Leu Leu Thr Gly Asn Thr Ala Ser 

2165 2170 2175 

Val Ala Ser Gly Arg He Ala Tyr Val Leu Gly Leu Glu Gly Pro Ala 

2180 2185 2190 

He Thr Val Asp Thr Ala Cys Ser Ser Ser Leu Val Ala Leu His Val 

2195 2200 2205 

Ala Ala Gly Ser Leu Arg Ser Gly Asp Cys Gly Leu Ala Val Ala Gly 

2210 2215 2220 

Gly Val Ser Val Met Ala Gly Pro Glu Val Phe Arg Glu Phe Ser Arg 
2225 2230 2235 2240 

Gin Gly Ala Leu Ala Pro Asp Gly Arg Cys Lys Pro Phe Ser Asp Glu 

2245 2250 2255 

Ala Asp Gly Phe Gly Leu Gly Glu Gly Ser Ala Phe Val Val Leu Gin 

2260 2265 2270 

Arg Leu Ser Val Ala Val Arg Glu Gly Arg Arg Val Leu Gly Val Val 

2275 2280 2285 

Val Gly Ser Ala Val Asn Gin Asp Gly Ala Ser Asn Gly Leu Ala Ala 

2290 2295 2300 

Pro Ser Gly Val Ala Gin Gin Arg Val He Arg Arg Ala Trp Gly Arg 
2305 * 2310 2315 2320 

Ala Gly Val Ser Gly Gly Asp Val Gly Val Val Glu Ala His Gly Thr 

2325 2330 2335 

Gly Thr Arg Leu Gly Asp Pro Val Glu Leu Gly Ala Leu Leu Gly Thr 

2340 2345 2350 

Tyr Gly Val Gly Arg Gly Gly Val Gly Pro Val Val Val Gly Ser Val 

2355 2360 2365 

Lys Ala Asn Val Gly His Val Gin Ala Ala Ala Gly Val Val Gly Val 

2370 2375 2380 

He Lys Val Val Leu Gly Leu Gly Arg Gly Leu Val Gly Pro Met Val 
2385 2390 2395 2400 

Cys Arg Gly Gly Leu Ser Gly Leu Val Asp Trp Ser Ser Gly Gly Leu 

2405 2410 2415 

Val Val Ala Asp Gly Val Arg Gly Trp Pro Val Gly Val Asp Gly Val 

2420 2425 2430 

Arg Arg Gly Gly Val Ser Ala Phe Gly Val Ser Gly Thr Asn Ala His 

2435 2440 2445 

Val Val Val Ala Glu Ala Pro Gly Ser Val Val Gly Ala Glu Arg Pro 

2450 2455 2460 

Val Glu Gly Ser Ser Arg Gly Leu Val Gly Val Val Gly Gly Val Val 
2465 2470 2475 2480 

Pro Val Val Leu Ser Ala Lys Thr Glu Thr Ala Leu His Ala Gin Ala 

2485 2490 2495 

Arg Arg Leu Ala Asp His Leu Glu Thr His Pro Asp Val Pro Met Thr 

2500 2505 2510 

Asp Val Val Trp Thr Leu Thr Gin Ala Arg Gin Arg Phe Asp Arg Arg 

2515 2520 2525 

Ala Val Leu Leu Ala Ala Asp Arg Thr Gin Ala Val Glu Arg Leu Arg 

2530 2535 2540 

Gly Leu Ala Gly Gly Glu Pro Gly Thr Gly Val Val Ser Gly Val Ala 
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2545 2550 2555 2560 

Ser Gly Gly Gly Val Val Phe Val Phe Pro Gly Gin Gly Gly Gin Trp 

2565 2570 2575 

Val Gly Met Ala Arg Gly Leu Leu Ser Val Pro Val Phe Val Glu Ser 

2580 2585 2590 

Val Val Glu Cys Asp Ala Val Val Ser Ser Val Val Gly Phe Ser Val 

2595 2600 2605 

Leu Gly Val Leu Glu Gly Arg Ser Gly Ala Pro Ser Leu Asp Arg Val 

2610 2615 2620 

Asp Val Val Gin Pro Val Leu Phe Val Val Met Val Ser Leu Ala Arg 
2625 2630 2635 2640 

Leu Trp Arg Trp Cys Gly Val Val Pro Ala Ala Val Val Gly His Ser 

2645 2650 2655 

Gin Gly Glu lie Ala Ala Ala Val Val Ala Gly Val Leu Ser Val Gly 

2660 2665 2670 

Asp Gly Ala Arg Val Val Ala Leu Arg Ala Arg Ala Leu Arg Ala Leu 

2675 2680 2685 

Ala Gly His Gly Gly Met Ala Ser Val Arg Arg Gly Arg Asp Asp Val 

2690 2695 2700 

Gin Lys Leu Leu Asp Ser Gly Pro Trp Thr Gly Lys Leu Glu lie Ala 
2705 2710 2715 2720 

Ala Val Asn Gly Pro Asp Ala Val Val Val Ser Gly Asp Pro Arg Ala 

2725 2730 2735 

Val Thr Glu Leu Val Glu His Cys Asp Gly lie Gly Val Arg Ala Arg 

2740 2745 2750 

Thr lie Pro Val Asp Tyr Ala Ser His Ser Ala Gin Val Glu Ser Leu 

2755 2760 2765 

Arg Glu Glu Leu Leu Ser Val Leu Ala Gly lie Glu Gly Arg Pro Ala 

2770 2775 2780 

Thr Val Pro Phe Tyr Ser Thr Leu Thr Gly Gly Phe Val Asp Gly Thr 
2785 2790 2795 2800 

Glu Leu Asp' Ala Asp Tyr Trp Tyr Arg Asn Leu Arg His Pro Val Arg 

2805 2810 2815 

Phe His Ala Ala Val Glu Ala Leu Ala Ala Arg Asp Leu Thr Thr Phe 

2820 2825 2830 

Val Glu Val Ser Pro His Pro Val Leu Ser Met Ala Val Gly Glu Thr 

2835 2840 2845 

Leu Ala Asp Val Glu Ser Ala Val Thr Val Gly Thr Leu Glu Arg Asp 

2850 2855 2860 

Thr Asp Asp Val Glu Arg Phe Leu Thr Ser Leu Ala Glu Ala His Val 
2865 2870 2875 2880 

His Gly Val Pro Val Asp Trp Ala Ala Val Leu Gly Ser Gly Thr Leu 

2885 2890 2895 

Val Asp Leu Pro Thr Tyr Pro Phe Gin Gly Arg Arg Phe Trp Leu His 

2900 2905 2910 

Pro Asp Arg Gly Pro Arg Asp Asp Val Ala Asp Trp Phe His Arg Val 

2915 2920 2925 

Asp Trp Thr Ala Thr Ala Thr Asp Gly Ser Ala Arg Leu Asp Gly Arg 

2930 2935 2940 

Trp Leu Val Val Val Pro Glu Gly Tyr Thr Asp Asp Gly Trp Val Val 
2945 2950 2955 2960 

Glu Val Arg Ala' Ala Leu Ala Ala Gly Gly Ala Glu Pro Val Val Thr 

2965 2970 2975 

Thr Val Glu Glu Val Thr Asp Arg Val Gly Asp Ser Asp Ala Val Val 

2980 2985 2990 

Ser Met Leu Gly Leu Ala Asp Asp Gly Ala Ala Glu Thr Leu Ala Leu 

2995 3000 3005 

Leu Arg Arg Leu Asp Ala Gin Ala Ser Thr Thr Pro Leu Trp Val Val 

3010 3015 3020 

Thr Val Gly Ala Val Ala Pro Ala Gly Pro Val Gin Arg Pro Glu Gin 
3025 3030 3035 3040 
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Ala Thr Val Trp Gly Leu Ala Leu Val Ala Ser Leu Glu Arg Gly His 

3045 3050 3055 

Arg Trp Thr Gly Leu Leu Asp Leu Pro Gin Thr Pro Asp Pro Gin Leu 

3060 3065 3070 

Arg Pro Arg Leu Val Glu Ala Leu Ala Gly Ala Glu Asp Gin Val Ala 

3075 3080 3085 

Val Arg Ala Asp Ala Val His Ala Arg Arg lie Val Pro Thr Pro Val 

3090 3095 3100 

Thr- Gly Ala Gly Pro Tyr Thr Ala Pro Gly Gly Thr lie Leu Val Thr 
3105 3110 3115 3120 

Gly Gly Thr Ala Gly Leu Gly Ala Val Thr Ala Arg Trp Leu Ala Glu 

3125 3130 3135 

Arg Gly Ala Glu His Leu Ala Leu Val Ser Arg Arg Gly Pro Gly Thr 

3140 3145 3150 

Ala Gly Val Asp Glu Val Val Arg Asp Leu Thr Gly Leu Gly Val Arg 

3155 3160 3165 

Val Ser Val His Ser Cys Asp Val Gly Asp Arg Glu Ser Val Gly Ala 

3170 3175 3180 

Leu Val Gin Glu Leu Thr Ala Ala Gly Asp Val Val Arg Gly Val Val 
3185 3190 3195 3200 

His Ala Ala Gly Leu Pro Gin Gin Val Pro Leu Thr Asp Met Asp Pro 

3205 3210 3215 

Ala Asp Leu Ala Asp Val Val Ala Val Lys Val Asp Gly Ala Val His 

3220 3225 3230 

Leu Ala Asp Leu Cys Pro Glu Ala Glu Leu Phe Leu Leu Phe Ser Ser 

3235 3240 3245 

Gly Ala Gly Val Trp Gly Ser Ala Arg Gin Gly Ala Tyr Ala Ala Gly 

3250 3255 3260 

Asn Ala Phe Leu Asp Ala Phe Ala Arg His Arg Arg Asp Arg Gly Leu 
3265 3270 3275 3280 

Pro Ala Thr Ser Val Ala Trp Gly Leu Trp Ala Ala Gly Gly Met Thr 

3285 3290 3295 

Gly Asp Gin Glu Ala Val Ser Phe Leu Arg Glu Arg Gly Val Arg Pro 

3300 3305 3310 

Met Ser Val Pro Arg Ala Leu Glu Ala Leu Glu Arg Val Leu Thr Ala 

3315 3320 3325 

Gly Glu Thr Ala Val Val Val Ala Asp Val Asp Trp Ala Ala Phe Ala 

3330 3335 3340 

Glu Ser Tyr Thr Ser Ala Arg Pro Arg Pro Leu Leu His Arg Leu Val 
3345 3350 3355 3360 

Thr Pro Ala Ala Ala Val Gly Glu Arg Asp Glu Pro Arg Glu Gin Thr 

3365 3370 3375 

Leu Arg Asp Arg Leu Ala Ala Leu Pro Arg Ala Glu Arg Ser Ala Glu 

3380 3385 3390 

Leu Val Arg Leu Val Arg Arg Asp Ala Ala Ala Val Leu Gly Ser Asp 

3395 3400 3405 

Ala Lys Ala Val Pro Ala Thr Thr Pro Phe Lys Asp Leu Gly Phe Asp 

3410 3415 3420 

Ser Leu Ala Ala Val Arg Phe Arg Asn Arg Leu Ala Ala His Thr Gly 
3425 3430 3435 3440 

Leu Arg Leu Pro Ala Thr Leu Val Phe Glu His Pro Asn Ala Ala Ala 

3445 3450 3455 

Val Ala Asp Leu Leu His Asp Arg Leu Gly Glu Ala Gly Glu Pro Thr 

3460 3465 3470 

Pro Val Arg Ser Val Gly Ala Gly Leu Ala Ala Leu Glu Gin Ala Leu 

3475 3480 3485 

Pro Asp Ala Ser Asp Thr Glu Arg Val Glu Leu Val Glu Arg Leu Glu 

3490 3495 3500 

Arg Met Leu Ala Gly Leu Arg Pro Glu Ala Gly Ala Gly Ala Asp Ala 
3505 3510 3515 3520 

Pro Thr Ala Gly Asp Asp Leu Gly Glu Ala Gly Val Asp Glu Leu Leu 
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3525 3530 3535 

Asp Ala Leu Glu Arg Glu Leu Asp Ala Arg 

3540 3545 

<210> 14 
<211> 3562 
<212> PRT 

<213> Microraonospora megalomicea 
<400> 14 

Met Thr Asp Asn Asp Lys Val Ala Glu Tyr Leu Arg Arg Ala Thr Leu 

1-5 10 15 

Asp Leu Arg Ala Ala Arg Lys Arg Leu Arg Glu Leu Gin Ser Asp Pro 

20 25 30 

lie Ala Val Val Gly Met Ala Cys Arg Leu Pro Gly Gly Val His Leu 

35 40 45 

Pro Gin His Leu Trp Asp Leu Leu Arg Gin Gly His Glu Thr Val Ser 

50 55 60 

Thr Phe Pro Thr Gly Arg Gly Trp Asp Leu Ala Gly Leu Phe His Pro 
65 70 75 80 

Asp Pro Asp His Pro Gly Thr Ser Tyr Val Asp Arg Gly Gly Phe Leu 

85 90 95 

Asp Asp Val Ala Gly Phe Asp Ala Glu Phe Phe Gly lie Ser Pro Arg 

100 105 110 

Glu Ala Thr Ala Met Asp Pro Gin Gin Arg Leu Leu Leu Glu Thr Ser 

115 120 125 

Trp Glu Leu Val Glu Ser Ala Gly lie Asp Pro His Ser Leu Arg Gly 

130 135 140 

Thr Pro Thr Gly Val Phe Leu Gly Val Ala Arg Leu Gly Tyr Gly Glu 
145 150 155 160 

Asn Gly Thr Glu Ala Gly Asp Ala Glu Gly Tyr Ser Val Thr Gly Val 

165 170 175 

Ala Pro Ala Val Ala Ser Gly Arg lie Ser Tyr Ala Leu Gly Leu Glu 

180 185 190 

Gly Pro Ser lie Ser Val Asp Thr Ala Cys Ser Ser Ser Leu Val Ala 

195 200 205 

Leu His Leu Ala Val Glu Ser Leu Arg Leu Gly Glu Ser Ser Leu Ala 

210 215 220 

Val Val Gly Gly Ala Ala Val Met Ala Thr Pro Gly Val Phe Val Asp 
225 230 235 240 

Phe Ser Arg Gin Arg Ala Leu Ala Ala Asp Gly Arg Ser Lys Ala Phe 

245 250 255 

Gly Ala Ala Ala Asp Gly Phe Gly Phe Ser Glu Gly Val Ser Leu Val 

260 265 270 

Leu Leu Glu Arg Leu Ser Glu Ala Glu Ser Asn Gly His Glu Val Leu 

275 280 285 

Ala Val lie Arg Gly Ser Ala Leu Asn Gin Asp Gly Ala Ser Asn Gly 

290 295 300 

Leu Ala Ala Pro Asn Gly Thr Ala Gin Arg Lys Val lie Arg Gin Ala 
305 310 315 320 

Leu Arg Asn Cys Gly Leu Thr Pro Ala Asp Val Asp Ala Val Glu Ala 

325 330 335 

His Gly Thr Gly Thr Thr Leu Gly Asp Pro lie Glu Ala Asn Ala Leu 

340 345 350 

Leu Asp Thr Tyr Gly Arg Asp Arg Asp Pro Asp His Pro Leu Trp Leu 

355 , 360 365 

Gly Ser Val Lys Ser Asn lie Gly His Thr Gin Ala Ala Ala Gly Val 

370 375 380 

Thr Gly Leu Leu Lys Met Val Leu Ala Leu Arg His Glu Glu Leu Pro 
385 390 395 400 

Ala Thr Leu His Val Asp Glu Pro Thr Pro His Val Asp Trp Ser Ser 
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Gly Ala Val Arg 

420 

Arg Pro Arg Arg 
435 

Ala His Val He . 
450 

Val Gly Gly Asp 
465 

Ala Ala Ala Leu 

Gly Ser Asp Val 

500 

Arg Ala Arg His 
515 

Glu Ala Val Arg 
530 

Glu Asp Thr Val 
545 

Phe Leu Phe Pro 

Leu Leu Asp Ser 

580 

Glu Ala Met Ala 
595 

Gin Glu Pro Gly 
610 

Val Leu Phe Ala 
625 

Gly Val Thr Pro 

Ala Ala His Val 

660 

Val Val Gly Arg 
675 

Met Ser Ala Val 
690 

Ser Trp Glu Asp 
705 

Val Val Val Ala 

Arg Glu Ala Glu 

740 

Ser His Ser Pro 
755 

Thr Gly Glu He 
770 

Val Asp Val Arg 
785 

Tyr Arg Asn Leu 

Leu Ala Asp Ser 

820 

Val Val Val Ser 
835 

Asp Ala Val Val 
850 

Ala Phe Leu Arg 
865 

Asp Trp Thr Pro 



405 

Leu Ala Thr Arg 

Ala Gly Val Ser 

440 

Val Glu Glu Ala 
455 

Val Gly Pro Val 
470 

Arg Ala Gin Ala 
485 

Gly Leu Ala Glu 

Glu His Arg Ala 

520 

Gly Leu Arg Glu 
535 

Thr Gly Val Ala 
550 

Gly Gin Gly Ser 
565 

Ala Pro Ala Phe 

Pro Leu Gin Asp 

600 

Ala Pro Gly Leu 
615 

Val Met Val Ser 
630 

Ala Ala Val Val 
645 

Ala Gly Ala Leu 

Ser Arg Leu Leu 

680 

Ala Leu Gly Glu 
695 

Arg He Ser Val 
710 

Gly Glu Pro Glu 
725 

Gly Val Arg Val 

Gin He Asp Arg 

760 

Glu Pro Arg Ser 
775 

Ala Val Asp Gly 
790 

Arg Glu Thr Val 
805 

Gly Tyr Asp Ala 

Ala Val Ala Glu 

840 

Val Gly Thr Leu 
855 

Ser Ala Ala Thr 
870 

Ala Leu Pro Gly 
885 



410 

Gly Arg Pro Trp 
425 

Ala. Phe Gly He 

Pro Glu Arg Thr 

460 

Pro Leu Val Val 
475 

Ala Gin Val Ala 
490 

Val Gly Arg Ser 
505 

Ala Val Val Ala 

Val Ala Ala Val 

540 

Glu Thr Ser Gly 
555 

Gin Trp Val Gly 
570 

Ala Asp Thr He 
585 

Trp Ser Val Ser 

Asp Arg Val Asp 

620 

Leu Ala Arg Leu 
635 

Gly His Ser Gin 
650 

Ser Leu Ala Asp 
665 

Arg Ser Leu Ser 

Ala Glu Val Arg 

700 

Ala Ala Val Asn 
715 

Ala Leu Arg Glu 
730 

Arg Glu He Asp 
745 

Val Arg Asp Glu 

Ala Glu He Thr 

780 

Thr Asp Leu Asp 
795 

Arg Phe Ala Asp 
810 

Phe Val Glu Val 
825 

Ala Val Glu Glu 

Ser Arg Gly Asp 

860 

Ala His Cys Ala 
875 

Ala Ala Thr He 
890 



415 

Arg Arg Gly Asp 
4 30 

Ser Gly Thr Asn 
445 

Thr Glu Arg Thr 

Ser Ala Arg Ser 

480 

Glu Leu Val Glu 
495 

Leu Ala Val Thr 
510 

Ser Thr Arg Ala 
525 

Glu Pro Arg Gly 

Arg Thr Val Val 

560 

Met Gly Ala Glu 
575 

Arg Ala Cys Asp 
590 

Asp Val Leu Arg 
605 

Val Val Gin Pro 

Trp Gin Ser Tyr 

640 

Gly Glu He Ala 
655 

Ala Ala Arg Leu 
670 

Gly Gly Gly Gly 
685 

Arg Arg Leu Arg 

Gly Pro Arg Ser 

720 

Trp Gly Arg Glu 
735 

Val Asp Tyr Ala 
750 

Leu Leu Thr Val 
765 

Phe Tyr Ser Thr 

Ala Gly Tyr Trp 

800 

Ala Met Thr Arg 
815 

Ser Pro His Pro 
830 

Ala Gly Val Glu 
845 

Gly Gly Pro Gly 

Gly Val Asp Val 

880 

Pro Leu Pro Thr 
895 
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Tyr Pro 


Phe 


Gin 
900 


Arg 


Lys 


Pro 


Tyr 


Trp 
905 


Leu 


Arg 


Ser 


Ser 


Ala 
910 


Pro 


Ala 


Pro Ala 


Ser 
915 


His 


Asp 


Leu 


Ala 


Tyr 
920 


Arg 


Val 


Ser 


Trp 


Thr 
925 


Pro 


He 


Thr 


Pro Pro 


Gly Asp 


Gly 


Val 


Leu 


Asp 


Gly 


Asp 


Trp 


Leu 


Val 


Val 


His 


Pro 


930 










935 










940 










Gly Gly 


Ser 


Thr 


Gly 


Trp 


Val 


Asp 


Gly 


Leu 


Ala 


Ala 


Ala 


He 


Thr 


Ala 


945 








950 










955 










960 


Gly Gly 


Gly 


Arg 


Val 
965 


Val 


Ala 


His 


Pro 


Val 
970 


Asp 


Ser 


Val 


Thr 


Ser 
975 


Arg 


Thr Gly 


Leu 


Ala 
980 


Glu 


Ala 


Leu 


Ala 


Arg 
985 


Arg 


Asp 


Gly 


Thr 


Phe 
990 


Arg 


Gly 


Val Leu 


Ser 


Trp 


Val 


Ala 


Thr 


Asp 


Glu 


Arg 


His 


Val 


Glu 


Ala 


Gly 


/Via 




995 










1000 








1005 






Val Ala 


Leu 


Leu 


Thr 


Leu 


Ala 


Gin 


Ala 


Leu 


Gly 


Asp 


Ala 


Gly 


lie 


Asp 


1010 








1015 








1020 








Ala Pro 


Leu 


Trp 


Cys 


Leu 


Thr 


Gin 


Glu 


Ala 


Val 


Arg 


Thr 


Pro 


Val 


Asp 


1025 








1030 








1035 








1040 


Gly Asp 


Leu 


Ala 


Arg 


Pro 


Ala 


Gin 


Ala 


Ala 


Leu 


His 


Gly 


Phe 


Ala 


Gin 








1045 








1050 








1055 


Val Ala 


Arg 


Leu 


Glu 


Leu 


Ala 


Arg 


Arg 


Phe 


Gly 


Gly 


Val 


Leu 


Asp 


Leu 






1060 








1065 








1070 




Pro Ala 


Thr 


Val 


Asp 


Ala 


Ala 


Gly 


Thr 


Arg 


Leu 


Val 


Ala 


Ala 


Val 


Leu 




1075 








1080 








1085 






Ala Gly 


Gly 


Gly 


Glu 


Asp 


Val 


Val 


Ala 


Val 


Arg 


Gly Asp 


Arg 


Leu 


Tyr 


1090 








1095 








1100 








Gly Arg 


Arg 


Leu 


Val 


Arg 


Ala 


Thr 


Leu 


Pro 


Pro 


Pro 


Gly 


Gly 


Gly 


Phe 


1105 








1110 








1115 








1120 


Thr Pro 


His 


Gly 


Thr 


Val 


Leu 


Val 


Thr 


Gly 


Ala 


Ala 


Gly 


Pro 


Val 


Gly 








1125 








1130 








1135 


Gly Arg 


Leu 


Ala 


Arg 


Trp 


Leu 


Ala 


Glu 


Arg 


Gly 


Ala 


Thr 


Arg 


Leu 


Val 






' 1140 








1145 








1150 




Leu Pro 


Gly 


Ala 


His 


Pro 


Gly 


Glu 


Glu 


Leu 


Leu 


Thr 


Ala 


He 


Arg 


Ala 




1155 








1160 








1165 






Ala Gly 


Ala 


Thr 


Ala 


Val 


Val 


Cys 


Glu 


Pro 


Glu 


Ala 


Glu 


Ala 


Leu 


Arg 


1170 








1175 








1180 








Thr Ala 


He 


Gly 


Gly 


Glu 


Leu 


Pro 


Thr 


Ala 


Leu 


Val 


His 


Ala 


Glu 


Thr 


1185 








1190 








1195 








1200 


Leu Thr 


Asn 


Phe 


Ala 


Gly 


Val 


Ala 


Asp 


Ala 


Asp 


Pro 


Glu 


Asp 


Phe 


Ala 








1205 








1210 








1215 


Ala Thr 


Val 


Ala 


Ala 


Lys 


Thr 


Ala 


Leu 


Pro 


Thr 


Val 


Leu 


Ala 


Glu 


Val 






1220 








1225 








1230 




Leu Gly 


Asp 


His 


Arg 


Leu 


Glu 


Arg 


Glu 


Val 


Tyr 


Cys 


Ser 


Ser 


Val 


Ala 




1235 








1240 








1245 






Gly Val 


Trp 


Gly 


Gly Val 


Gly 


Met 


Ala 


Ala 


Tyr 


Ala 


Ala 


Gly 


Ser 


Ala 


1250 








1255 








1260 








Tyr Leu 


Asp 


Ala 


Leu 


Val 


Glu 


His 


Arg 


Arg 


Ala 


Arg 


Gly 


His 


Ala 


Ser 


1265 








1270 








1275 








1280 


Ala Ser 


Val 


Ala 


Trp 


Thr 


Pro 


Trp 


Ala 


Leu 


Pro 


Gly Ala 


Val 


Asp 


Asp 



1285 1290 1295 



Gly Arg Leu Arg Glu Arg Gly Leu Arg Ser Leu Asp Val Ala Asp Ala 

1300 1305 ' 1310 

Leu Gly Thr Trp Glu Arg Leu Leu Arg Ala Gly Ala Val Ser Val Ala 

1315 1320 1325 

Val Ala Asp Val Asp Trp Ser Val Phe Thr Glu Gly Phe Ala Ala lie 

1330 1335 1340 

Arg Pro Thr Pro Leu Phe Asp Glu Leu Leu Asp Arg Arg Gly Asp Pro 
1345 1350 1355 1360 

Asp Gly Ala Pro Val Asp Arg Pro Gly Glu Pro Ala Gly Glu Trp Gly 

1365 1370 1375 

Arg Arg He Ala Ala Leu Ser Pro Gin Glu Gin Arg Glu Thr Leu Leu 
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1380 1385 1390 

Thr Leu Val Gly Glu Thr Val Ala Glu Val Leu Gly His Glu Thr Gly 

1395 1400 1405 

Thr Glu lie Asn Thr Arg Arg Ala Phe Ser Glu Leu Gly Leu Asp Ser 

1410 1415 1420 

Leu Gly Ser Met Ala Leu Arg Gin Arg Leu Ala Ala Arg Thr Gly Leu 
1425 1430 1435 1440 

Arg Met Pro Ala Ser Leu Val Phe Asp His Pro Thr Val Thr Ala Leu 

1445 - 1450 1455 

Ala Arg Tyr Leu Arg Arg Leu Val Val Gly Asp Ser Asp Pro Thr Pro 

1460 1465 1470 

Val Arg Val Phe Gly Pro Thr Asp Glu Ala Glu Pro Val Ala Val Val 

1475 1480 1485 

Gly lie Gly Cys Arg Phe Pro Gly Gly lie Ala Thr Pro Glu Asp Leu 

1490 1495 1500 

Trp Arg Val Val Ser Glu Gly Thr Ser lie Thr Thr Gly Phe Pro Thr 
1505 1510 1515 1520 

Asp Arg Gly Trp Asp Leu Arg Arg Leu Tyr His Pro Asp Pro Asp His 

1525 1530 1535 

Pro Gly Thr Ser Tyr Val Asp Arg Gly Gly Phe Leu Asp Gly Ala Pro 

1540 1545 1550 

Asp Phe Asp Pro Gly Phe Phe Gly lie Thr Pro Arg Glu Ala Leu Ala 

1555 1560 1565 

Met Asp Pro Gin Gin Arg Leu Thr Leu Glu lie Ala Trp Glu Ala Val 

1570 1575 1580 

Glu Arg Ala Gly lie Asp Pro Glu Thr Leu Leu Gly Ser Asp Thr Gly 
1585 1590 1595 1600 

Val Phe Val Gly Met Asn Gly Gin Ser Tyr Leu Gin Leu Leu Thr Gly 

1605 1610 1615 

Glu Gly Asp Arg Leu Asn Gly Tyr Gin Gly Leu Gly Asn Ser Ala Ser 

1620 1625 1630 

Val Leu Ser* Gly Arg Val Ala Tyr Thr Phe Gly Trp Glu Gly Pro Ala 

1635 1640 1645 

Leu Thr Val Asp Thr Ala Cys Ser Ser Ser Leu Val Ala lie His Leu 

1650 1655 1660 

Ala Met Gin Ser Leu Arg Arg Gly Glu Cys Ser Leu Ala Leu Ala Gly 
\CCb 1670 1675 1680 

Gly Val Thr Val Met Ala Asp Pro Tyr Thr Phe Val Asp Phe Ser Ala 

1685 1690 1695 

Gin Arg Gly Leu Ala Ala Asp Gly Arg Cys Lys Ala Phe Ser Ala Gin 

1700 1705 1710 

Aid Asp Gly Phe Ala Leu Ala Glu Gly Val Ala Ala Leu Val Leu Glu 

1715 1720 1725 

Pro Leu Ser Lys Ala Arg Arg Asn Gly His Gin Val Leu Ala Val Leu 

1730 1735 1740 

Arg Gly Ser Ala Val* Asn Gin Asp Gly Ala Ser Asn Gly Leu Ala Ala 
1745 1750 1755 1760 

Pro Asn Gly Pro Ser Gin Glu Arg Val lie Arg Gin Ala Leu Thr Ala 

1765 1770 1775 

Ser Gly Leu Arg Pro Ala Asp Val Asp Met Val Glu Ala His Gly Thr 

1780 1785 1790 

Gly Thr Glu Leu Gly Asp Pro lie Glu Ala Gly Ala Leu lie Ala Ala 

1795 1800 1805 

Tyr Gly Arg Asp Arg Asp Arg Pro Leu Trp Leu Gly Ser Val Lys Thr 

1810 1815 1820 

Asn lie Gly His Thr Gin Ala Ala Ala Gly Ala Ala Gly Val lie Lys 
1825 1830 1835 1840 

Ala Val Leu Ala Met Arg His Gly Val Leu Pro Arg Ser Leu His Ala 

1845 1850 1855 

Asp Glu Leu Ser Pro His lie Asp Trp Ala Asp Gly Lys Val Glu Val 

1860 1865 1870 
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Leu Arg 


Glu 
1875 


Ala 


Arg 


Gin 


Trp 


Pro 
1880 


Pro 


Gly 


Glu 


Arg 


Pro 
1885 


Arg 


Arg 


Ala 


Gly Val 


Ser 


Ser 


Phe 


Gly Val 


Ser 


Gly 


Thr 


Asn 


Ala 


His 


Val 


He 


Val 


1890 










1895 










1900 








Glu Glu 


Ala 


Pro 


Ala 


Glu 


Pro 


Asp 


Pro 


Glu 


Pro 


Val 


Pro 


Ala 


Ala 


Pro 


1905 








1910 








1915 








1920 


Gly Gly 


Pro 


Leu 


Pro 


Phe 


Val 


Leu 


His 


Gly 


Arg 


Ser 


Val 


Gin 


Thr 


Val 








1925 








1930 








1935 


Arg Ser 


Gin 


Ala 


Arg 


Thr 


Leu 


Ala 


Glu 


His 


Leu 


Arg 


Thr 


Thr 


Gly 


His 






1940 








1945 










1950 




Arg Asp 


Leu 


Ala 


Asp 


Thr 


Ala 


Arg 


Thr 


Leu 


Ala 


Thr 


Gly 


Arg 


Ala 


Arg 




1955 








1960 








1965 






Phe Asp 


Val 


Arg 


Ala 


Ala 


Val 


Leu 


Gly 


Thr 


Asp 


Arg 


Glu 


Gly 


Val 


Cys 


1970 








1975 








1980 








Ala Ala 


Leu 


Asp 


Ala 


Leu 


Ala 


Gin 


Asp 


Arg 


Pro 


Ser 


Pro 


Asp 


Val 


Val 


1985 








1990 








1995 








2000 


Ala Pro 


Ala 


Val 


Phe 


Ala 


Ala 


Arg 


Thr 


Pro 


Val 


Leu 


Val 


Phe 


Pro 


Gly 








2005 








2010 








2015 


Gin Gly 


Ser 


Gin 


Trp 


Val 


Gly 


Met 


Ala 


Arg 


Asp 


Leu 


Leu 


Asp 


Ser 


Ser 






2020 








2025 








2030 




Glu Val 


Phe 


Ala 


Glu 


Ser 


Met 


Gly 


Arg 


Cys 


Ala 


Glu 


Ala 


Leu 


Ser 


Pro 




2035 








2040 








2045 






Tyr Thr 


Asp 


Trp 


Asp 


Leu 


Leu 


Asp 


Val 


Val 


Arg 


Gly 


Val 


Gly 


Asp 


Pro 


2050 








2055 








2060 








Asp Pro 


Tyr 


Asp 


Arg 


Val 


Asp 


Val 


Leu 


Gin 


Pro 


Val 


Leu 


Phe 


Ala 


Val 


2065 








2070 








2075 








2080 


Met Val 


Ser 


Leu 


Ala 


Arg 


Leu 


Trp 


Gin 


Ser 


Tyr 


Gly 


Val 


Thr 


Pro 


Gly 








2085 








2090 








2095 


Ala Val 


Val 


Gly 


His 


Ser 


Gin 


Gly 


Glu 


He 


Ala 


Ala 


Ala 


His 


Val 


Ala 






2100 








2105 








2110 




Gly Ala 


Leu 


Ser 


Leu 


Ala 


Asp 


Ala 


Ala 


Arg 


Val 


Val 


Ala 


Leu 


Arg 


Ser 




2115 








2120 








2125 






Arg Val 


Leu 


Arg 


Glu 


Leu 


Asp 


Asp 


Gin 


Gly 


Gly 


Met 


Val 


Ser 


Val 


Gly 


2130 








2135 








2140 








Thr Ser 


Arg 


Ala 


Glu 


Leu 


Asp 


Ser 


Val 


Leu 


Arg 


Arg 


Trp 


Asp 


Gly 


Arg 


2145 








2150 








2155 








2160 


Val Ala 


Val 


Ala 


Ala 


Val 


Asn 


Gly 


Pro 


Gly 


Thr 


Leu 


Val 


Val 


Ala 


Gly 








2165 








2170 








2175 


Pro Thr 


Ala 


Glu 


Leu 


Asp 


Glu 


Phe 


Leu 


Ala 


Val 


Ala 


Glu 


Ala 


Arg 


Glu 






2180 








2185 








2190 




Met Arg 


Pro 


Arg 


Arg 


He 


Ala 


Val 


Arg 


Tyr 


Ala 


Ser 


His 


Ser 


Pro 


Glu 




2195 








2200 








2205 






Val Ala 


Arg 


Val 


Glu 


Gin 


Arg 


Leu 


Ala 


Ala 


Glu 


Leu 


Gly 


Thr 


Val 


Thr 


2210 








2215 








2220 








Ala Val 


Gly 


Gly 


Thr 


Val 


Pro 


Leu 


Tyr 


Ser 


Thr 


Ala 


Thr 


Gly 


Asp 


Leu 


2225 








2230 








2235 








224C 


Leu Asp 


Thr 


Thr 


Ala 


Met 


Asp 


Ala 


Gly 


Tyr 


Trp 


Tyr 


Arg 


Asn 


Leu 


Arg 








2245 








2250 








2255 


Gin Pro 


Val 


Leu 


Phe 


Glu 


His 


Ala 


Val 


Arg 


Ser 


Leu 


Leu 


Glu 


Arg 


Gly 






2260 








2265 








2270 




Phe Glu 


Thr 


Phe 


He 


Glu 


Val 


Ser 


Pro 


His 


Pro 


Val 


Leu 


Leu 


Met 


Ala 




2275 








2280 








2285" 






Val Glu 


Glu 


Thr 


Ala 


Glu 


Asp 


Ala 


Glu 


Arg 


Pro 


Val 


Thr 


Gly 


Val 


Pro 


2290 








2295 








2300 








Thr Leu 


Arg 


Arg 


Asp 


His 


Asp 


Gly 


Pro 


Ser 


Glu 


Phe 


Leu 


Arg 


Asn 


Leu 


2305 








2310 








2315 








232( 


Leu Gly 


Ala 


His 


Val 


His 


Gly 


Val 


Asp 


Val 


Asp 


Leu 


Arg 


Pro 


Ala 


Val 



2325 2330 2335 



Ala His Gly Arg Leu Val Asp Leu Pro Thr Tyr Pro Phe Asp Arg Gin 

2340 2345 2350 

Arg Leu Trp Pro Lys Pro His Arg Arg Ala Asp Thr Ser Ser Leu Gly 
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2355 2360 2365 

Val Arg Asp Ser Thr His Pro Leu Leu His Ala Ala Val Asp Val Pro 

2370 2375 2380 

Gly His Gly Gly Ala Val Phe Thr Gly Arg Leu Ser Pro Asp Glu Gin 
2385 2390 2395 2400 

Gin Trp Leu Thr Gin His Val Val Gly Gly Arg Asn Leu Val Pro Gly 

2405 2410 2415 

Ser Val Leu Val Asp Leu Ala Leu Thr Ala Gly Ala Asp Val Gly Val 

2420 2425 2430 

Pro Val Leu Glu Glu Leu Val Leu Gin Gin Pro Leu Val Leu Thr Ala 

2435 2440 2445 

Ala Gly Ala Leu Leu Arg Leu Ser Val Gly Ala Ala Asp Glu Asp Gly 

2450 2455 2460 

Arg Arg Pro Val Glu lie His Ala Ala Glu Asp Val Ser Asp Pro Ala 
2465 2470 2475 2480 

Glu Ala Arg Trp Ser Ala Tyr Ala Thr Gly Thr Leu Ala Val Gly Val 

2485 2490 2495 

Ala Gly Gly Gly Arg Asp Gly Thr Gin Trp Pro Pro Pro Gly Ala Thr 

2500 2505 2510 

Ala Leu Thr Leu Thr Asp His Tyr Asp Thr Leu Ala Glu Leu Gly Tyr 

2515 2520 2525 

Glu Tyr Gly Pro Ala Phe Gin Ala Leu Arg Ala Ala Trp Gin His Gly 

2530 2535 2540 

Asp Val Val Tyr Ala Glu Val Ser Leu Asp Ala Val Glu Glu Gly Tyr 
2545 2550 2555 2560 

Ala Phe Asp Pro Val Leu Leu Asp Ala Val Ala Gin Thr Phe Gly Leu 

2565 2570 2575 

Thr Ser Arg Ala Pro Gly Lys Leu Pro Phe Ala Trp Arg Gly Val Thr 

2580 2585 2590 

Leu His Ala Thr Gly Ala Thr Ala Val Arg Val Val Ala Thr Pro Ala 

2595 2600 2605 

Gly Pro Asp* Ala Val Ala Leu Arg Val Thr Asp Pro Thr Gly Gin Leu 

2610 2615 2620 

Val Ala Thr Val Asp Ala Leu Val Val Arg Asp Ala Gly Ala Asp Arg 
2625 2630 2635 2640 

Asp Gin Pro Arg Gly Arg Asp Gly Asp Leu His Arg Leu Glu Trp Val 

2645 2650 2655 

Arg Leu Ala Thr Pro Asp Pro Thr Pro Ala Ala Val Val His Val Ala 

2660 2665 2670 

Ala Asp Gly Leu Asp Asp Leu Leu Arg Ala Gly Gly Pro Ala Pro Gin 

2675 2680 2685 

Ala Val Val Val Arg Tyr Arg Pro Asp Gly Asp Asp Pro Thr Ala Glu 

2690 2695 2700 

Ala Arg His Gly Val Leu Trp Ala Ala Thr Leu Val Arg Arg Trp Leu 
2705 2710 2715 2720 

Asp Asp Asp Arg Trp Pro Ala Thr Thr Leu Val Val Ala Thr Ser Ala 

2725 2730 2735 

Gly Val Glu Val Ser Pro Gly Asp Asp Val Pro Arg Pro Gly Ala Ala 

2740 2745 2750 

Ala Val Trp Gly Val Leu Arg Cys Ala Gin Ala Glu Ser Pro Asp Arg 

2755 2760 2765 

Phe Val Leu Val Asp Gly Asp Pro Glu Thr Pro Pro Ala Val Pro Asp 

2770 2775 2780 

Asn Pro Gin Leu Ala Val Arg Asp Gly Ala Val Phe Val Pro Arg Leu 
2785 2790 2795 2800 

Thr Pro Leu Ala Gly Pro Val Pro Ala Val Ala Asp Arg Ala Tyr Arg 

2805 2810 2815 

Leu Val Pro Gly Asn Gly Gly Ser lie Glu Ala Val Ala Phe Ala Pro 

2820 2825 2830 

Val Pro Asp Ala Asp Arg Pro Leu Ala Pro Glu Glu Val Arg Val Ala 
2835 2840 2845 
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Val Arg Ala Thr Gly Val Asn Phe Arg Asp Val Leu Leu Ala Leu Gly 

2850 2855 2860 

Met Tyr Pro Glu Pro Ala Glu Met Gly Thr Glu Ala Ser Gly Val Val 
2865 v 2870 2875 2880 

Thr Glu Val Gly Ser Gly Val Arg Arg Phe Thr Pro Gly Gin Ala Val 

2885 2890 2895 

Thr Gly Leu Phe Gin Gly Ala Phe Gly Pro Val Ala Val Ala Asp His 

2900 2905 2910 

Arg Leu Leu Thr Pro Val Pro Asp Gly Trp Arg Ala Val Asp. Ala Ala 

2915 2920 2925 - 

Ala Val Pro lie Ala Phe Thr Thr Ala His Tyr Ala Leu His Asp Leu 

2930 2935 2940 

Ala Gly Leu Gin Ala Gly Gin Ser Val Leu Val His Ala Ala Ala Gly 
2945 2950 2955 2960 

Gly Val Gly Met Ala Ala Val Ala Leu Ala Arg Arg Ala Gly Ala Glu 

2965 2970 2975 

Val Phe Ala Thr Ala Ser Pro Ala Lys His Pro Thr Leu Arg Ala Leu 

2980 2985 2990 

Gly Leu Asp Asp Asp His lie Ala Ser Ser Arg Glu Ser Gly Phe Gly 

2995 3000 3005 

Glu Arg Phe Ala Ala Arg Thr Gly Gly Arg Gly Val Asp Val Val Leu 

3010 3015 3020 

Asn Ser Leu Thr Gly Asp Leu Leu Asp Glu Ser Ala Arg Leu Leu Ala 
3025 3030 3035 3040 

Asp Gly Gly Val Phe Val Glu Met Gly Lys Thr Asp Leu Arg Pro Ala 

3045 3050 3055 

Glu Gin Phe Arg Gly Arg Tyr Val Pro Phe Asp Leu Ala Glu Ala Gly 

3060 3065 .3070 

Pro Asp Arg Leu Gly Glu lie Leu Glu Glu Val Val Gly Leu Leu Ala 

3075 3080 3085 

Ala Gly Ala Leu Asp Arg Leu Pro Val Ser Val Trp Glu Leu Ser Ala 

3090 * 3095 3100 

Ala Pro Ala Ala Leu Thr His Met Ser Arg Gly Arg His Val Gly Lys 
3105 3110 3115 3120 

Leu Val Leu Thr Gin Pro Ala Pro Val His Pro Asp Gly Thr Val Leu 

3125 3130 3135 

Val Thr Gly Gly Thr Gly Thr Leu Gly Arg Leu Val Ala Arg His Leu 

3140 -3145 3150 

Val Thr Gly His Gly Val Pro His Leu Leu Val Ala Ser Arg Arg Gly 

3155 3160 3165 

Pro Ala Ala Pro Gly Ala Ala Glu Leu Arg Ala Asp Val Glu Gly Leu 

3170 3175 3180 

Gly Ala Thr lie Glu He Val Ala Cys Asp Thr Ala Asp Arg Glu Ala 
3185 3190 3195 3200 

Leu Ala Ala Leu Leu Asp Ser He Pro Ala Asp Arg Pro Leu Thr Gly 

3205 3210 3215 

Val Val His Thr Ala Gly Val Leu Ala Asp Gly Leu Val Thr Ser He 

3220 3225 3230 

Asp Gly Thr Ala Thr Asp Gin Val Leu Arg Ala Lys Val Asp Ala Ala 

3235 3240 3245 

Trp His Leu His Asp Leu Thr Arg Asp Ala Asp Leu Ser Phe Phe Val 

3250 3255 3260 

Leu Phe Ser Ser Ala Ala Ser Val Leu Ala Gly Pro Gly Gin Gly Val 
3265 3270 3275 3280 

Tyr Ala Ala Ala Asn Gly Val Leu Asn Ala Leu Ala Gly Gin Arg Arg 

3285 3290 3295 

Ala Leu Gly Leu Pro Ala Lys Ala Leu Gly Trp Gly Leu Trp Ala Gin 

3300 3305 3310 

Ala Ser Glu Met Thr Ser Gly Leu Gly Asp Arg He Ala Arg Thr Gly 

3315 3320 3325 

Val Ala Ala Leu Pro' Thr Glu Arg Ala Leu Ala Leu Phe Asp Ala Ala 
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3330 3335 3340 

Leu Arg Ser Gly Gly Glu Val Leu Phe Pro Leu Ser Val Asp Arg Ser 
3345 3350 3355 3360 

Ala Leu Arg Arg Ala Glu Tyr Val Pro Glu Val Leu Arg Gly Ala Val 

3365 3370 3375 

Arg Ser Thr Pro Arg Ala Ala Asn Arg Ala Glu Thr Pro Gly Arg .Gly 

3380 3385 3390 

Leu Leu Asp Arg Leu Val Gly Ala Pro Glu Thr Asp Gin Val Ala Ala 

3395 3400 3405 

Leu Ala Glu Leu Val Arg Ser His Ala Ala Ala Val Ala Gly Tyr Asp 

3410 3415 3420 

Ser Ala Asp Gin Leu Pro Glu Arg Lys Ala Phe Lys Asp Leu Gly Phe 
3425 3430 3435 3440 

Asp Ser Leu Ala Ala Val Glu Leu Arg Asn Arg Leu Gly Val Thr Thr 

3445 3450 3455 

Gly Val Arg Leu Pro Ser Thr Leu Val Phe Asp His Pro Thr Pro Leu 

3460 3465 3470 

Ala Val Ala Glu His Leu Arg Ser Glu Leu Phe Ala Asp Ser Ala Pro 

3475 3480 3485 

Asp Val Gly Val Gly Ala Arg Leu Asp Asp Leu Glu Arg Ala Leu Asp 

3490 3495 3500 

Ala Leu Pro Asp Ala Gin Gly His Ala Asp Val Gly Ala Arg Leu Glu 
3505 3510 3515 3520 

Ala Leu Leu Arg Arg Trp Gin Ser Arg Arg Pro Pro Glu Thr Glu Pro 

3525 3530 3535 

Val Thr lie Ser Asp Asp Ala Ser Asp Asp Glu Leu Phe Ser Met Leu 

3540 3545 3550 

Asp Arg Arg Leu Gly Gly Gly Gly Asp Val 
3555 3560 

<210> 15 
<211> 3201 * 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 15 

Met Ser Glu Ser Ser Gly Met Thr Glu Asp Arg Leu Arg Arg Tyr Leu 

15 10 15 

Lys Arg Thr Val Ala Glu Leu Asp Ser Val Thr Gly Arg Leu Asp Glu 

20 25 30 

Val Glu Tyr Arg Ala Arg Glu Pro lie Ala Val Val Gly Met Ala Cys 

35 40 45 

Arg Phe Pro Gly Gly Val Asp Ser Pro Glu Ala Phe Trp Glu Phe lie 

50 55 60 

Arg Asp Gly Gly Asp Ala lie Ala Glu Ala Pro Thr Asp Arg Gly Trp 
65 70 75 80 

Pro Pro Ala Pro Arg Pro Arg Leu Gly Gly Leu Leu Ala Glu Pro Gly 

85 90 95 

Ala Phe Asp Ala Ala Phe Phe Gly lie Ser Pro Arg Glu Ala Leu Ala 

100 105 110 

Thr Asp Pro Gin Gin Arg Leu Met Leu Glu lie Ser Trp Glu Ala Leu 

115 120 125 

Glu Arg Ala Gly Phe Asp Pro Ser Ser Leu Arg Gly Ser Ala Gly Gly 

130 135 140 

Val Phe Thr Gly Val Gly Ala Val Asp Tyr Gly Pro Arg Pro Asp Glu 
145 150 155 ' 160 

Ala Pro Glu Glu Val Leu Gly Tyr Val Gly He Gly Thr Ala Ser Ser 

165 170 175 

Val Ala Ser Gly Arg Val Ala Tyr Thr Leu Gly Leu Glu Gly Pro Ala 

180 185 190 

Val Thr Val Asp Thr Ala Cys Ser Ser Gly Leu Thr Ala Val His Leu 
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195 200 205 

Ala Met Glu Ser Leu Arg Arg Asp Glu Cys Thr Leu Val Leu Ala Gly 

210 215 220 

Gly Val Thr Val Met Ser Ser Pro . Gly Ala Phe Thr Glu Phe Arg Ser 
225 230 235 240 

Gin Gly Gly Leu Ala Glu Asp Gly Arg Cys Lys Pro Phe Ser Arg Ala 

245 250 255 

Ala Asp Gly Phe Gly Leu Ala Glu Gly Ala Gly Val Leu Val Leu Gin 

260 265 270 

Arg Leu Ser Val Ala Arg Ala Glu Gly Arg Pro Val Leu Ala Val Leu 

275 280 285 

Arg Gly Ser Ala lie Asn Gin Asp Gly Ala Ser Asn Gly Leu Thr Ala 

290 295 300 

Pro Ser Gly Pro Ala Gin Arg Arg Val lie Arg Gin Ala Leu Glu Arg 
305 310 315 320 

Ala Arg Leu Arg Pro Val Asp Val Asp Tyr Val Glu Ala His Gly Thr 

325 330 335 

Gly Thr Arg Leu Gly Asp Pro lie Glu Ala His Ala Leu Leu Asp Thr 

340 345 350 

Tyr Gly Ala Asp Arg Glu Pro Gly Arg Pro Leu Trp Val Gly Ser Val 

355 360 . 365 

Lys Ser Asn He Gly His Thr Gin Ala Ala Ala Gly Val Ala Gly Val 

370 375 380 

Met Lys Thr Val Leu Ala Leu Arg His Arg Glu He Pro Ala Thr Leu 
385 390 395 400 

His Phe Asp Glu Pro Ser Pro His Val Asp Trp Asp Arg Gly Ala Val 

405 410 415 

Ser Val Val Ser Glu Thr Arg Pro Trp Pro Val Gly Glu Arg Pro Arg 

420 425 430 

Arg Ala Gly Val Ser Ser Phe Gly He Ser Gly Thr Asn Ala His Val 

435 440 445 

He Val Glu' Glu Ala Pro Ser Pro Gin Ala Ala Asp Leu Asp Pro Thr 

450 455 460 

Pro Gly Pro Ala Thr Gly Ala Thr Pro Gly Thr Asp Ala Ala Pro Thr 
465 470 475 480 

Ala Glu Pro Gly Ala Glu Ala Val Ala Leu Val Phe Ser Ala Arg Asp 

485 490 495 

Glu Arg Ala Leu Arg Ala Gin Ala Ala Arg Leu Ala Asp Arg Leu Thr 

500 505 510 

Asp Asp Pro Ala Pro Ser Leu Arg Asp Thr Ala Phe Thr Leu Val Thr 

515 520 525 

Arg Arg Ala Thr Trp Glu His Arg Ala Val Val Val Gly Gly Gly Glu 

530 535 540 

Glu Val Leu Ala Gly Leu Arg Ala Val Ala Gly Gly Arg Pro Val Asp 
545 550 555 560 

Gly Ala Val Ser Gly Arg Ala Arg Ala Gly Arg Arg Val Val Leu Val 

565 570 575 

Phe Pro Gly Gin Gly Ala Gin Trp Gin Gly Met Ala Arg Asp Leu Leu 

580 585 590 

Arg Gin Ser Pro Thr Phe Ala Glu Ser He Asp Ala Cys Glu Arg Ala 

595 600 605 

Leu Ala Pro His Val Asp Trp Ser Leu Arg Glu Val Leu Asp Gly Glu 

610 615 620 

Gin Ser Leu Asp Pro Val Asp Val Val Gin Pro Val Leu Phe Ala Val 
625 630 635 640 

Met Val Ser Leu Ala Arg Leu Trp Gin Ser Tyr Gly Val Thr Pro Gly 

645 650 655 

Ala Val Val Gly His Ser Gin Gly Glu He Ala Ala Ala His Val Ala 

660 665 670 

Gly Ala Leu Ser Leu Ala Asp Ala Ala Arg Val Val Ala Leu Arg Ser 
675 680 685 
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* 



Arg 


Val 


Leu 


Arg 


Arg 


T - - - 

Leu 


Gly Gly His 


Gly 


Gly 


Met 


Ala 


Ser 


Phe 


Gly 




r\ f\ 
690 










trtc 

69 1> 








700 










Leu 


His 


Pro 


Asp 


Gin 


Ala 


Ala 


Glu Arg 


He 


Ala 


Arg 


Phe 


Ala 


Gly 


Ala 


705 










7 10 








715 










720 


Leu 


Thr 


Val 


Ala 


Ser 


Val 


Asn 


bXy Pro 


Arg 


Ser 


Val 


Val 


Leu 


Ala 


Gly 










725 








730 










735 




Glu 


Asn 


Gly 


Pro 


Leu 


Asp 


Glu 


Leu He 


Ala 


Glu 


Cys 


Glu 


Ala 


Glu 


Gly 








740 








745 










750 






Val 


Thr 


Ala 


Arg 


Arg 


lie 


Pro 


Val Asp 


Tyr 


Ala 


Ser 


His- 


Ser 


Pro 


Gin 






i r** ^ 

755 










760 








765 








Val 


Glu 


Ser 


Leu 


Arg 


Glu 


Glu 


Leu Leu 


Ala 


Ala 


Leu 


Ala 


Gly 


Val 


Arg 




770 










775 








780 










Pro 


Val 


Ser 


Ala 


Gly 


lie 


Pro 


Leu Tyr 


Ser 


Thr 


Leu 


Thr 


Gly 


Gin 


Val 


785 










7 90 








795 










800 


lie 


Glu 


Thr 


Ala 


Thr 


Met 


Asp 


Ala Asp 


Tyr 


Trp 


Phe 


Ala 


Asn 


Leu 


Arg 










805 








810 










815 




Glu 


Pro 


Val 


Arg 


Phe 


Gin 


Asp 


Ala Thr 


Arg 


Gin 


Leu 


Ala 


Glu 


Ala 


Gly 








820 








825 










830 






Phe 


Asp 


Ala 


Phe 


Val 


Glu 


Val 


Ser Pro 


His 


Pro 


Val 


Leu 


Thr 


Val 


Gly 






835 










840 








845 








Val 


Glu 


Ala 


Thr 


Leu 


Glu 


Ala 


Val Leu 


Pro 


Pro 


Asp 


Ala 


Asp 


Pro 


Cys 




850 










855 








860 










Val 


Thr 


Gly 


Thr 


Leu 


Arg 


Arg 


Glu Arg 


Gly 


Gly 


Leu 


Ala 


Gin 


Phe 


His 


865 










870 








875 










880 


Thr 


Ala 


Leu 


Ala 


Glu 


Ala 


Tyr 


Tnr Arg 


Gly 


Val 


Glu 


Val 


Asp 


Trp 


Arg 










88 5 








890 










895 




Thr 


Ala 


Val 


Gly 


Glu 


Gly 


Arg 


Pro Val 


Asp 


Leu 


Pro 


Val 


Tyr 


Pro 


Phe 








900 








905 










910 






Gin 


Arg 


Gin 


Asn 


Phe 


Trp 


Leu 


Pro Val 


Pro 


Leu Gly Arg 


Val 


Pro 


Asp 






915 










r\ o /"\ 

920 








925 








Thr 


Gly 


Asp 


Glu 


Trp 


Arg 


Tyr 


Cain Leu 


Ala 


Trp 


His 


Pro 


Val 


Asp 


Leu 




930 










935 








940 










Gly Arg 


Ser 


Ser 


■9- ____ 

Leu 


Ala 


Gly 


Arg Val 


Leu 


Val 


Val 


Thr 


Gly Ala 


Ala 


945 










950 








955 










960 


Val 


Pro 


Pro 


Ala 


Trp 


Thr 


Asp 


Val Val 


Arg 


Asp 


Gly 


Leu 


Glu 


Gin 


Arg 










965 








970 










975 




Gly 


Ala 


Thr 


Val 


Val 


Leu 


Cys 


Thr Ala 


Gin 


Ser 


Arg 


Ala 


Arg 


He 


Gly 








C\ O f\ 

980 








985 










990 






Ala 


Ala 


Leu 


Asp 


Ala 


Val 


Asp 


Gly Thr 


Ala 


Leu 


Ser 


Thr 


Val 


Val 


Ser 






995 










1000 








1005 






Leu 


Leu 


Ala 


Leu 


Ala 


Glu 


Gly 


Gly Ala 


Val 


Asp 


Asp 


Pro 


Ser 


Leu 


Asp 




1010 








1015 






1020 








Thr 


Leu 


Ala 


Leu 


Val 


Gin 


Ala 


Leu Gly Ala 


Ala 


Gly 


He 


Asp 


Val 


Pro 


1025 








1030 






1035 








1040 


Leu 


Trp 


Leu 


Val 


Thr 


Arg 


Asp 


Ala Ala 


Ala 


Val 


Thr 


Val 


Gly Asp 


Asp 










1045 






1050 








1055 


Val 


Asp 


Pro 


Ala 


Gin 


Ala 


Met 


Val Gly 


Gly 


Leu 


Gly Arg 


Val 


Val 


Gly 








1060 






1065 








1070 




Val 


Glu 


Ser 


Pro 


Ala 


Arg 


Trp 


Gly Gly 


Leu 


Val 


Asp 


Leu 


Arg 


Glu 


Ala 






1075 








1080 








1085 






Asp 


Ala 


Asp 


Ser 


Ala 


Arg 


Ser 


Leu Ala 


Ala 


He 


Leu 


Ala 


Asp 


Pro 


Arg 




1090 








1095 






1100 








Gly Glu 


Glu 


Gin 


Phe 


Ala 


lie 


Arg Pro 


Asp 


Gly 


Val 


Thr 


Val 


Ala 


Arg 


1105 








1110 






1115 








1120 


Leu 


Val 


Pro 


Ala 


Pro 


Ala 


Arg Ala Ala 


Gly 


Thr 


Arg 


Trp 


Thr 


Pro 


Arg 










1125 






1130 








113-5 


Gly 


Thr 


Val 


Leu 


Val 


Thr 


Gly Gly Thr 


Gly 


Gly 


He 


Gly Ala 


His 


Leu 








1140 






1145 








1150 




Ala 


Arg 


Trp 


Leu 


Ala 


Gly Ala 


Gly Ala 


Glu 


His 


Leu 


Val 


Leu 


Leu 


Asn 






1155 








1160 








1165 






Arg Arg 


Gly 


Ala 


Glu 


Ala 


Ala 


Gly Ala 


Ala 


Asp 


Leu 


Arg 


Asp 


Glu 


Leu 



43 



SDOCID: <WO 



01272B4A3 IA> 



WO 01/27284 



PCT/US00/27433 



1170 1175 1180 

Val Ala Leu Gly Thr Gly Val Thr He Thr Ala Cys Asp Val Ala Asp 
1185 1190 1195 1200 

Arg Asp Arg Leu Ala Ala Val Leu Asp Ala Ala Arg Ala Gin Gly Arg 

1205 1210 1215 

Val Val Thr Ala Val Phe His Ala Ala Gly He Ser Arg Ser Thr Ala 

1220 1225 1230 

Val Gin Glu Leu Thr Glu Ser Glu Phe Thr Glu He Thr Asp Ala Lys 

1235 1240 1245 

Val Arg Gly Thr Ala Asn Leu Ala Glu Leu Cys Pro Glu Leu Asp Ala 

1250 1255 1260 

Leu Val Leu Phe Ser Ser Asn Ala Ala Val Trp Gly Ser Pro Gly Leu 
1265 1270 1275 1280 

Ala Ser Tyr Ala Ala Gly Asn Ala Phe Leu Asp Ala Phe Ala Arg Arg 

1285 1290 1295 

Gly Arg Arg Ser Gly Leu Pro Val Thr Ser He Ala Trp Gly Leu Trp 

1300 1305 1310 

Ala Gly Gin Asn Met Ala Gly Thr Glu Gly Gly Asp Tyr Leu Arg Ser 

1315 1320 1325 

Gin Gly Leu Arg Ala Met Asp Pro Gin Arg Ala He Glu Glu Leu Arg 

1330 1335 1340 

Thr Thr Leu Asp Ala Gly Asp Pro Trp Val Ser Val Val Asp Leu Asp 
1345 . 1350 1355 1360 

Arg Glu Arg Phe Val Glu Leu Phe Thr Ala Ala Arg Arg Arg Pro Leu 

1365 1370 1375 

Phe Asp Glu Leu Gly Gly Val Arg Ala Gly Ala Glu Glu Thr Gly Gin 

1380 1385 1390' 

Glu Ser Asp Leu Ala Arg Arg Leu Ala Ser Met Pro Glu Ala Glu Arg 

1395 1400 1405 

His Glu His Val Ala Arg Leu Val Arg Ala Glu Val Ala Ala Val Leu 

1410 1415 , 1420 

Gly His Gly' Thr Pro Thr Val He Glu Arg Asp Val Ala Phe Arg Asp 
1425 1430 1435 1440 

Leu Gly Phe Asp Ser Met Thr Ala Val Asp Leu Arg Asn Arg Leu Ala 

1445 1450 1455 

Aid Val Thr Gly Val Arg Val Ala Thr Thr He Val Phe Asp His Pro 

1460 1465 1470 

Thr Val Asp Arg Leu Thr Ala His Tyr Leu Glu Arg Leu Val Gly Glu 

1475 1480 1485 

Pro Glu Ala Thr Thr Pro Ala Ala Ala Val Val Pro Gin Ala Pro Gly 

1490 1495 1500 

Glu Ala Asp Glu Pro He Ala lie Val Gly Met Ala Cys Arg Leu Ala 
1505 1510 1515 1520 

Gly Gly Val Arg Thr Pro Asp Gin Leu Trp Asp Phe He Val Ala Asp 

1525 1530 1535 

Gly Asp Ala Val Thr Glu Met Pro Ser Asp Arg Ser Trp Asp Leu Asp 

1540 1545 1550 

Ala Leu Phe Asp Pro Asp Pro Glu Arg His Gly Thr Ser Tyr Ser Arg 

1555 1560 1565 

His Gly Ala Phe Leu Asp Gly Ala Ala Asp Phe Asp Ala Ala Phe Phe 

1570 1575 1580 

Gly He Ser Pro Arg Glu Ala Leu Ala Met Asp Pro Gin Gin Arg Gin 
1585 1590 1595 1600 

Val Leu Glu Thr Thr Trp Glu Leu Phe Glu Asn Ala Gly He Asp Pro 

1605 1610 1615 

His Ser Leu Arg Gly Thr Asp Thr Gly Val Phe Leu Gly Ala Ala Tyr 

1620 1625 1630 

Gin Gly Tyr Gly Gin Asn Ala Gin Val Pro Lys Glu Ser Glu Gly Tyr 

1635 1640 1645 

Leu Leu Thr Gly Gly Ser Ser Ala Val Ala Ser Gly Arg He Ala Tyr 
1650 1655 1660 
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Val Leu Gly Leu Glu Gly Pro Ala lie Thr Val Asp Thr Ala Cys Ser 
1665 1670 1675 ' 1680 

Ser Ser Leu Val Ala Leu His Val Ala Ala Gly Ser Leu Arg Ser Gly 

1685 1690 1695 

Asp Cys Gly Leu Ala Val Ala Gly Gly Val Ser Val Met Ala Gly Pro 

1700 t 1705 1710 

Glu Val Phe Thr Glu Phe Ser Arg Gin Gly Ala Leu Ala Pro Asp Gly 

1715 1720 1725 

Arg Cys Lys Pro Phe Ser Asp Gin Ala Asp Gly Phe Gly Phe Ala Glu 

1730 . 1735 1740 . - * 

Gly Val Ala Val Val Leu Leu Gin Arg Leu Ser Val Ala Val Arg Glu 
1745 1750 1755 1760 

Gly Arg Arg Val Leu Gly Val Val Val Gly Ser Ala Val Asn Gin Asp 

1765 1770 1775 

Gly Ala Ser Asn Gly Leu Ala Ala Pro Ser Gly Val Ala Gin Gin Arg 

1780 1785 1790 

Val lie Arg Arg Ala Trp Gly Arg Ala Gly Val Ser Gly Gly Asp Val 

1795 1800 1805 

Gly Val Val Glu Ala His Gly Thr Gly Thr Arg Leu Gly Asp Pro Val 

1810 1815 1820 

Glu Leu Gly Ala Leu Leu Gly Thr Tyr Gly Val Gly Arg Gly Gly Val 
1825 1830 1835 1840 

Gly Pro Val Val Val Gly Ser Val Lys Ala Asn Val Gly His Val Gin 

1845 1850 1855 

Ala Ala Ala Gly Val Val Gly Val lie Lys Val Val Leu Gly Leu Gly 

1860 1865 1870 

Arg Gly Leu Val Gly Pro Met Val Cys Arg Gly Gly Leu Ser Gly Leu 

1875 1880 1885 

Val Asp Trp Ser Ser Gly Gly Leu Val Val Ala Asp Gly Val Arg Gly 

1890 1895 1900 

Trp Pro Val Gly Val Asp Gly Val Arg Arg Gly Gly Val Ser Ala Phe 
1905 ' 1910 1915 1920 

Gly Val Ser Gly Thr Asn Ala His Val Val Val Ala Glu Ala Pro Gly 

1925 1930 1935 

Ser Val Val Gly Ala Glu Arg Pro Val Glu Gly Ser Ser Arg Gly Leu 

1940 1945 1950 

Val Gly Val Ala Gly Gly Val Val Pro Val Val Leu Ser Ala Lys Thr 

1955 1960 1965 

Glu Thr Ala Leu Thr Glu Leu Ala Arg Arg Leu His Asp Ala Val Asp 

1970 1975 1980 

Asp Thr Val Ala Leu Pro Ala Val Ala Ala Thr Leu Ala Thr Gly Arg 
1985 1990 1995 2000 

Ala His Leu Pro Tyr Arg Ala Ala Leu Leu Ala Arg Asp His Asp Glu 

2005 2010 2015 

Leu Arg Asp Arg Leu Arg Ala Phe Thr Thr Gly Ser Ala Ala Pro Gly 

2020 2025 2030 

Val Val Ser Gly Val Ala Ser Gly Gly Gly Val Val Phe Val Phe Pro 

2035 2040 2045 

Gly Gin Gly Gly Gin Trp Val Gly Met Ala Arg Gly Leu Leu Ser Val 

2050 2055 2060 

Pro Val Phe Val Glu Ser Val Val Glu Cys Asp Ala Val Val Ser Ser 
2065 2070 2075 2080 

Val Val Gly Phe Ser Val Leu Gly Val Leu Glu Gly Arg Ser Gly Ala 

2085 2090 2095 

Pro Ser Leu Asp Arg Val Asp Val Val Gin Pro Val Leu Phe Val Val 

2100 2105 2110 

Met Val Ser Leu Ala Arg Leu Trp Arg Trp Cys Gly Val Val Pro Ala 

2115 2120 2125 

Ala Val Val Gly His Ser Gin Gly Glu lie Ala Ala Ala Val Val Ala 

2130 2135 2140 

Gly Val Leu Ser Val Gly Asp Gly Ala Arg Val Val Ala Leu Arg Ala 
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2145 2150 2155 2160 

Arg Ala Leu Arg Ala Leu Ala Gly His Gly Gly Met Val Ser Leu Ala 

2165 2170 2175 

Val Ser Ala Glu Arg Ala Arg Glu Leu lie Ala Pro Trp Ser Asp Arg 

2180 2185 • 2190 

lie Ser Val Ala Ala Val Asn Ser Pro Thr Ser Val Val Val Ser Gly 

2195 2200 2205 

Asp Pro Gin Ala Leu Ala Ala Leu Val Ala His Cys Ala Glu Thr Gly 

2210 2215 2220 

Glu Arg Ala Lys Thr Leu Pro Val Asp Tyr Ala Ser His Ser Ala His 
2225 2230 2235 2240 

Val Glu Gin He Arg Asp Thr He Leu Thr Asp Leu Ala Asp Val Thr 

2245 2250 2255 

Ala Arg Arg Pro Asp Val Ala Leu Tyr Ser Thr Leu His Gly Ala Arg 

2260 2265 2270 

Gly Ala Gly Thr Asp Met Asp Ala Arg Tyr Trp Tyr Asp Asn Leu Arg 

2275 2280 2285 

Ser Pro Val Arg Phe Asp Glu Ala Val Glu Ala Ala Val Ala Asp Gly 

2290 2295 2300 

Tyr Arg Val Phe Val Glu Met Ser Pro His Pro Val Leu Thr Ala Ala 
2305 2310 2315 2320 

Val Gin Glu He Asp Asp Glu Thr Val Ala lie Gly Ser Leu His Arg 

2325 2330 2335 

Asp Thr Gly Glu Arg His Leu Val Ala Glu Leu Ala Arg Ala His Val 

2340 2345 2350 

His Gly Val Pro Val Asp Trp Arg Ala He Leu Pro Ala Thr His Pro 

2355 2360 2365 

Val Pro Leu Pro Asn Tyr Pro Phe Glu Ala Thr Arg Tyr Trp Leu Ala 

2370 2375 2380 

Pro Thr Ala Ala Asp Gin Val Ala Asp His Arg Tyr Arg Val Asp Trp 
2385 2390 2395 2400 

Arg Pro Leu' Ala Thr Thr Pro Ala Glu Leu Ser Gly Ser Tyr Leu Val 

2405 2410 2415 

Phe Gly Asp Ala Pro Glu Thr Leu Gly His Ser Val Glu Lys Ala Gly 

2420 2425 2430 

Gly Leu Leu Val Pro Val Ala Ala Pro Asp Arg Glu Ser Leu Ala Val 

2435 2440 2445 

Ala Leu Asp Glu Ala Ala Gly Arg Leu Ala Gly Val Leu Ser Phe Ala 

2450 2455 2460 

Ala Asp Thr Ala Thr His Leu Ala Arg His Arg Leu Leu Gly Glu Ala 
2465 2470 2475 2480 

Asp Val Glu Ala Pro Leu Trp Leu Val Thr Ser Gly Gly Val Ala Leu 

2485 2490 2495 

Asp Asp His Asp Pro He Asp Cys Asp Gin Ala Met Val Trp Gly He 

2500 2505 2510 

Gly Arg Val Met Gly Leu Glu Thr Pro His Arg Trp Gly Gly Leu Val 

2515 2520 2525 

Asp Val Thr Val Glu Pro Thr Ala Glu Asp Gly Val Val Phe Ala Ala 

2530 2535 2540 

Leu Leu Ala Ala Asp Asp His Glu Asp Gin Val Ala Leu Arg Asp Gly 
2545 2550 2555 2560 

He Arg His Gly Arg Arg Leu Val "Arg Ala Pro Leu Thr Thr Arg Asn 

2565 2570 2575 

Ala Arg Trp Thr Pro Ala Gly Thr Ala Leu Val Thr Gly Gly Thr Gly 

2580 2585 2590 

Ala Leu Gly Gly His Val Ala Arg Tyr Leu Ala Arg Ser Gly Val - Thr 

2595 2600 2605 

Asp Leu Val Leu Leu Ser Arg Ser Gly Pro Asp Ala Pro Gly Ala Ala 

2610 2615 2620 

Glu Leu Ala Ala Glu Leu Ala Asp Leu Gly Ala Glu Pro Arg Val Glu 
2625 2630 2635 2640 
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Ala Cys Asp Val Thr Asp Gly Pro Arg Leu Arg Ala Leu Val Gin Glu 

2645 2650 2655 

Leu Arg Glu Gin Asp Arg Pro Val Arg lie Val Val His Thr Ala Gly 

2660 2665 2670 

Val Pro Asp Ser Arg Pro Leu Asp Arg lie Asp Glu Leu Glu Ser Val 

2675 2680 2685' 

Ser Ala Ala Lys Val Thr Gly Ala Arg Leu Leu Asp Glu Leu Cys Pro 

2690 2695 2700* 

Asp Ala Asp Thr Phe Val Leu Phe Ser Ser Gly Ala Gly Val Trp Gly 
2705 2710 2715 2720 

Ser Ala Asn Leu Gly Ala Tyr Ala Ala Ala Asn Ala Tyr Leu Asp Ala 

2725 2730 2735 

Leu Ala His Arg Arg Arg Gin Ala Gly Arg Ala Ala Thr Ser Val Ala 

2740 2745 2750 

Trp Gly Ala Trp Ala Gly Asp Gly Met Ala Thr Gly Asp Leu Asp Gly 

2755 2760 2765 

Leu Thr Arg Arg Gly Leu Arg Ala Met Ala Pro Asp Arg Ala Leu Arg 

2770 2775 2780 

Ala Cys Thr Arg Arg Trp Thr Thr His Asp Thr Cys Val Ser Val Ala 
2785 2790 2795 2800 

Asp Val Asp Trp Asp Arg Phe Ala Val Gly Phe Thr Ala Ala Arg Pro 

2805 2810 2815 

Arg Pro Leu lie Asp Glu Leu Val Thr Ser Ala Pro Val Ala Ala Pro 

2820 2825 2830 

Thr Ala Ala Ala Ala Pro Val Pro Ala Met Thr Ala Asp Gin Leu Leu 

2835 2840 2845 

Gin Phe Thr Arg Ser His Val Ala Ala lie Leu Gly His Gin Asp Pro 

2850 2855 2860 

Asp Ala Val Gly Leu Asp Gin Pro Phe Thr Glu Leu Gly Phe Asp Ser 
2865 2870 2875 2880 

Leu Thr Ala Val Gly Leu Arg Asn Gin Leu Gin Gin Ala Thr Gly Arg 

2885 2890 2895 

Thr Leu Pro Ala Ala Leu Val Phe Gin His Pro Thr Val Arg Arg Leu 

2900 2905 2910 

Ala Asp His Leu Ala Gin Gin Leu Asp Val Gly Thr Ala Pro Val Glu 

2915 2920 2925 

Ala Thr Gly Ser Val Leu Arg Asp Gly Tyr Arg Arg Ala Gly Gin Thr 

2930 2935 2940 

Gly Asp Val Arg Ser Tyr Leu Asp Leu Leu Ala Asn Leu Ser Glu Phe 
2945 2950 2955 2960 

Arg Glu Arg Phe Thr Asp Ala Ala Ser Leu Gly Gly Gin Leu Glu Leu 

2965 2970 2975 

Val Asp Leu Ala Asp Gly Ser Gly Pro Val Thr Val lie Cys Cys Ala 

2980 2985 2990 

Gly Thr Ala Ala Leu Ser Gly Pro His Glu Phe Ala Arg Leu Ala Ser 

2995 3000 3005 

Ala Leu Arg Gly Thr Val Pro Val Arg Ala Leu Ala Gin Pro Gly Tyr 

3010 3015 3020 

Glu Ala Gly Glu Pro Val Pro Ala Ser Met Glu Ala Val Leu Gly Val 
3025 3030 3035 3040 

Gin Ala Asp Ala Val Leu Ala Ala Gin Gly Asp Thr Pro Phe Val Leu 

3045 3050 3055 

Val Gly His Ser Ala Gly Ala Leu Met Ala Tyr Ala Leu Ala Thr Glu 

3060 3065 3070 

Leu Ala Asp Arg Gly His Pro Pro Arg Gly Val Val Leu Leu Asp Val 

3075 3080 3085 

Tyr Pro Pro Gly His Gin Glu Ala Val His Ala Trp Leu Gly Glu Leu 

3090 3095 3100 

Thr Ala Ala Leu Phe Asp His Glu Thr Val Arg Met Asp Asp Thr Arg 
3105 3110 3115 3120 

Leu Thr Ala Leu Gly Ala Tyr Asp Arg Leu Thr Gly Arg Trp Arg Pro 
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3125 3130 3135 

Arg Asp Thr Gly Leu Pro Thr Leu Val Val Ala Ala Ser Glu Pro Met 

3140 3145 3150 

Gly Glu Trp Pro Asp Asp Gly Trp Gin Ser Thr Trp Pro Phe Gly His 

3155 3160 3165 

Asp Arg Val Thr Val Pro Gly Asp His Phe Ser Met Val Gin Glu His 

3170 3175 3180 

Ala Asp Ala lie Ala Arg His lie Asp Ala Trp Leu Ser Gly Glu Arg 
3185 3190 3195 3200 

Ala 



<210> 16 
<211> 358 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 16 

Met Asn Thr Thr Asp Arg Ala Val Leu Gly Arg Arg Leu Gin Met lie 

1 5 10 15 

Arg Gly Leu Tyr Trp Gly Tyr Gly Ser Asn Gly Asp Pro Tyr Pro Met 

20 25 30 

Leu Leu Cys Gly His Asp Asp Asp Pro His Arg Trp Tyr Arg Gly Leu 

35 40 45 

Gly Gly Ser Gly Val Arg Arg Ser Arg Thr Glu Thr Trp Val Val Thr 

50 55 60 

Asp His Ala Thr Ala Val Arg Val Leu Asp Asp Pro Thr Phe Thr Arg 
65 70 75 80 

Ala Thr Gly Arg Thr Pro Glu Trp Met Arg Ala Ala Gly Ala Pro Ala 

85 90 95 

Ser Thr Trp Ala Gin Pro Phe Arg Asp Val His Ala Ala Ser Trp Asp 

' 100 105 110 

Ala Glu Leu Pro Asp Pro Gin Glu Val Glu Asp Arg Leu Thr Gly Leu 

115 120 125 

Leu Pro Ala Pro Gly Thr Arg Leu Asp Leu Val Arg Asp Leu Ala Trp 

130 135 140 

Pro Met Ala Ser Arg Gly Val Gly Ala Asp Asp Pro Asp Val Leu Arg 
145 150 155 160 

Ala Ala Trp Asp Ala Arg Val Gly Leu Asp Ala Gin Leu Thr Pro Gin 

165 170 175 

Pro Leu Ala Val Thr Glu Ala Ala lie Ala Ala Val Pro Gly Asp Pro 

180 185 190 

His Arg Arg Ala Leu Phe Thr Ala Val Glu Met Thr Ala Thr Ala Phe 

195 200 205 

Val Asp Ala Val Leu Ala Val Thr Ala Thr Ala Gly Ala Ala Gin Arg 

210 215 220 

Leu Ala Asp Asp Pro Asp Val Ala Ala Arg Leu Val Ala Glu Val Leu 
225 230 235 240 

Arg Leu His Pro Thr Ala His Leu Glu Arg Arg Thr Ala Gly Thr Glu 

245 250 255 

Thr Val Val Gly Glu His Thr Val Ala Ala Gly Asp Glu Val Val Val 

260 265 " 270 

Val Val Ala Ala Ala Asn Arg Asp Ala Gly Val Phe Ala Asp Pro Asp 

275 280 285 

Arg Leu Asp Pro Asp Arg Ala Asp Ala Asp Arg Ala Leu Ser Ala Gin 

290 295 300 

Arg Gly His Pro Gly Arg Leu Glu Glu Leu Val Val Val Leu Thr Thr 
305 310 315 320 

Ala Ala Leu Arg Ser Val Ala Lys Ala Leu Pro Gly -Leu Thr Ala Gly 

325 330 335 

Gly Pro Val Val Arg Arg Arg Arg Ser Pro Val Leu Arg Ala Thr Ala 
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340 345 350 

His Cys Pro Val Glu Leu 
355 

<210> 17 
<211> 422 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 17 

Met Arg Val Val Phe Ser Ser Met Ala Ser Lys Ser His Leu Phe Gly 

15 10 15 

Leu Val Pro Leu Ala Trp Ala Phe Arg Ala Ala Gly His Glu Val Arg 

20 25 30 

Val Val Ala Ser Pro Ala Leu Thr Asp Asp lie Thr Ala Ala Gly Leu 

35 40 45 

Thr Ala Val Pro Val Gly Thr Asp Val Asp Leu Val Asp Phe Met Thr 

50 55 60 

His Ala Gly Tyr Asp lie He Asp Tyr Val Arg Ser Leu Asp Phe Ser 
65 70 75 80 

Glu Arg Asp Pro Ala Thr Ser Thr Trp Asp His Leu -Leu Gly Met Gin 

85 90 95 

Thr Val Leu Thr Pro Thr Phe Tyr Ala Leu Met Ser Pro Asp Ser Leu 

100 105 110 

Val Glu Gly Met He Ser Phe Cys Arg Ser Trp Arg Pro Asp Trp Ser 

115 120 125 

Ser Gly "Pro Gin Thr Phe Ala Ala Ser He Ala Ala Thr Val Thr Gly 

130 135 140 

Val Ala His Ala Arg Leu Leu Trp Gly Pro Asp He Thr Val Arg Ala 
145 ' 150 155 160 

Arg Gin Lys Phe Leu Gly Leu Leu Pro Gly Gin Pro Ala Ala His Arg 

165 170 175 

Glu Asp Pro Leu Ala Glu Trp Leu Thr Trp Ser Val Glu Arg Phe Gly 

180 185 190 

Gly Arg Val Pro Gin Asp Val Glu Glu Leu Val Val Gly Gin Trp Thr 

195 200 205 

He Asp Pro Ala Pro Val Gly Met Arg Leu Asp Thr Gly Leu Arg Thr 

210 215 220 

Val Gly Met Arg Tyr Val Asp Tyr Asn Gly Pro Ser Val Val Pro Asp 
225 230 235 240 

Trp Leu His Asp Glu Pro Thr Arg Arg Arg Val Cys Leu Thr Leu Gly 

245 250 255 

He Ser Ser Arg Glu Asn Ser He Gly Gin Val Ser Val Asp Asp Leu 

260 265 270 

Leu Gly Ala Leu Gly Asp Val Asp Ala Glu He He Ala Thr Val Asp 

275 280 285 

Glu Gin Gin Leu Glu Gly Val Ala His Val Pro Ala Asn lie Arg Thr 

290 295 300 

Val Gly Phe Val Pro Met His Ala Leu Leu Pro Thr Cys Ala Ala Thr 
305 310 315 320 

Val His His Gly Gly Pro Gly Ser Trp His Thr Ala Ala He His Gly 

325 330 335 

Val Pro Gin Val He Leu Pro Asp Gly Trp Asp Thr Gly Val Arg Ala 

340 345 350 

Gin Arg Thr Glu Asp Gin Gly Ala Gly He Ala Leu Pro Val Pro Glu 

355 360 365 

Leu Thr Ser Asp Gin Leu Arg Glu Ala Val Arg Arg Val Leu Asp Asp 

370 375 380 

Pro Ala Phe Thr Ala Gly Ala Ala Arg Met Arg Ala Asp Met Leu Ala 
385 390 395 400 

Glu Pro Ser Pro Ala Glu Val Val Asp Val Cys Ala Gly Leu Val Gly 
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405 410 415 

Glu Arg Thr Ala Val Gly 

420 

<210> 18 
<211> 323 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 18 

Met Ser Thr Asp Ala Thr His Val Arg Leu Gly Arg Cys Ala Leu Leu 

15 10 15 

Thr Ser Arg Leu Trp Leu Gly Thr Ala Ala Leu Ala Gly Gin Asp Asp 

20 25 30 

Ala Asp Ala Val Arg Leu Leu Asp His Ala Arg Ser Arg Gly Val Asn 

35 40 45 

Cys Leu Asp Thr Ala Asp Asp Asp Ser Ala Ser Thr Ser Ala Gin Val 

50 55 60 

Ala Glu Glu Ser Val Gly Arg Trp Leu Ala Gly Asp Thr Gly Arg Arg 
65 70 75 80 

Glu Glu Thr Val Leu Ser Val Thr Val Gly Val Pro Pro Gly Gly Gin 

85 90 95 

Val Gly Gly Gly Gly Leu Ser Ala Arg Gin lie lie Ala Ser Cys Glu 

100 105 110 

Gly Ser Leu Arg Arg Leu Gly Val Asp His Val Asp Val Leu His Leu 

115 120 125 

Pro Arg Val Asp Arg Val Glu Pro Trp Asp Glu Val Trp Gin Ala Val 

130 135 140 

Asp Ala Leu Val Ala Ala Gly Lys Val Cys Tyr Val Gly Ser Ser Gly 
145 150 155 160 

Phe Pro Gly Trp His lie Val Ala Ala Gin Glu His Ala Val Arg Arg 

165 170 175 

His Arg Leu Gly Leu Val Ser His Gin Cys Arg Tyr Asp Leu Thr Ser 

180 185 190 

Arg His Pro Glu Leu Glu Val Leu Pro Ala Ala Gin Ala Tyr Gly Leu 

195 200 205 

Gly Val Phe Ala Arg Pro Thr Arg Leu Gly Gly Leu Leu Gly Gly Asp 

210 215 220 

Gly Pro Gly Ala Ala Ala Ala Arg Ala Ser Gly Glh Pro Thr Ala Leu 
225 230 235 240 

Arg Ser Ala Val Glu Ala Tyr Glu Val Phe Cys Arg Asp Leu Gly Glu 

245 250 255 

His Pro Ala Glu Val Ala Leu Ala Trp Val Leu Ser Arg Pro Gly Val 

260 265 270 

Ala Gly Ala Val Val Gly Ala Arg Thr Pro Gly Arg Leu Asp Ser Ala 

275 280 285 

Leu Arg Ala Cys Gly Val Ala Leu Gly Ala Thr Glu Leu Thr Ala Leu 

290 295 300 

Asp Gly lie Phe Pro Gly Val Ala Ala Ala Gly Ala Ala Pro Glu Ala 
305 310 315 320 

Trp Leu Arg 



<210> 19 
<211> 247 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 19 

Met Asn Thr Trp Leu Arg Arg Phe Gly Ser Ala Asp Gly His Arg Ala 
15 10 15 
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Arg 


Leu 


Tyr 


Cys 


Phe 


Pro 


His 


Ala 


Gly Ala 


Ala 


Ala 


Asp 


Ser 


Tyr 


Leu 






20 










25 










30 






Asp 


Leu 


Ala 


Arg 


Ala 


Leu 


Ala 


Pro 


Glu 


Val 


Asp 


Val 


Trp 


Ala 


Val 


Gin 




35 










40 










45 








Tyr 


Pro 


Gly 


Arg 


Gin 


Asp Arg 


Arg 


Asp 


Glu 


Arg 


Ala 


Leu 


Gly 


Thr 


Ala 


50 










55 










60 










Gly 


Glu 


lie 


Ala 


Asp 


Glu 


Val 


Ala 


Ala 


Val 


Leu 


Arg 


Asp 


Leu 


Val 


Gly 


65 










70 










75 










80 


Glu 


Val 


Pro 


Phe 


Ala 


Leu 


Phe 


Gly 


His 


Ser 


Met 


Gly Ala 


Leu 


Val 


Ala 










85 










90 










95 




Tyr 


Glu 


Thr 


Ala 


Arg 


Arg 


Leu 


Glu 


Ala 


Arg 


Pro 


Gly 


Val 


Arg 


Pro 


Leu 






100 










105 










110 






Arg 


Leu 


Phe 


Val 


Ser 


Gly 


Gin 


Thr 


Ala 


Pro 


Arg 


Val 


His 


Glu 


Arg 


Arg 




115 










120 










125 








Thr 


Asp 


Leu 


Pro 


Asp 


Glu 


Asp 


Gly 


Leu 


Val 


Glu 


Gin 


Met 


Arg 


Arg 


Leu 




130 










135 










140 










Gly 


Val 


Ser 


Glu 


Ala 


Ala 


Leu 


Ala 


Asp 


Gin 


Gly 


Leu 


Leu 


Asp 


Met 


Ser 


145 










150 










155 










160 


Leu 


Pro 


Val 


Leu 


Arg 


Ala 


Asp 


His 


Arg 


Val 


Leu 


Arg 


Ser 


Tyr 


Ala 


Trp 










165 










170 










175 




Gin 


Ala 


Gly 


Pro 


Pro 


Leu 


Arg 


Ala 


Gly 


lie 


Thr 


Thr 


Leu 


Cys 


Gly 


Asp 








180 










185 










190 






Thr 


Asp 


Pro 


Leu 


Thr 


Thr 


Val 


Glu 


Asp 


Ala 


Gin 


Arg 


Trp 


Leu 


Pro 


Tyr 






195 










200 










205 








Ser 


Val 


Val 


Pro 


Gly 


Arg 


Thr 


Arg 


Thr 


Phe 


Pro 


Gly 


Gly 


His 


Phe 


Tyr 




210 










215 










220 










Leu 


Ala 


Asp 


His 


Val 


Gly 


Glu 


Val 


Ala 


Glu 


Ser 


Val 


Ala 


Pro 


Asp 


Leu 


225 










230 










235 










240 


Leu 


Arg 


Leu 


Thr 


Pro 


Thr 


Gly 





















245 



<2)0> 20 
<211> 189 
<212> PRT 

<213> Micromonospora megalomicea 
v400> 20 



I le 


Arg 


Val 


Gin 


Asp 


Asp 


Asp 


Ala 


Asp 


Arg 


Leu 


Ser 


Arg 


Asp 


Glu 


Leu 


1 








5 










10 










15 




Thr 


Ser 


He 


Ala 
20 


Leu 


Val 


Leu 


Leu 


Leu 
25 


Ala 


Gly 


Phe 


Glu 


Ala 
30 


Ser 


Val 


Ser 


Leu 


He 
35 


Gly 


He 


Gly 


Thr 


Tyr 
40 


Leu 


Leu 


Leu 


Thr 


His 
45 


Pro 


Asp 


Gin 


Leu 


Ala 
50 


Leu 


Val 


Arg 


Lys 


Asp 
55 


Pro 


Ala 


Leu 


Leu 


Pro 
60 


Gly 


Ala 


Val 


Glu 


Glu 


lie 


Leu 


Arg 


Tyr 


Gin 


Ala 


Pro 


Pro 


Glu 


Thr 


Thr 


Thr 


Arg 


Phe 


Ala 


Lb 










70 










75 










80 


Thr 


Ala 


Glu 


Val 


Glu 
85 


lie 


Gly 


Gly 


Val 


Thr 
90 


He 


Pro 


Ala 


Tyr 


Ser 
95 


Thr 


Val 


Leu 


He 


Ala 
100 


Asn 


Gly 


Ala 


Ala 


Asn 
105 


Arg 


Asp 


Pro 


Gly 


Gin 
110 


Phe 


Pro 


Asp 


Pro 


Asp 
115 


Arg 


Phe 


Asp 


Val 


Thr 
120 


Arg 


Asp 


Ser 


Arg 


Gly 
125 


His 


Leu 


Thr 


Phe 


Gly 
130 


His 


Gly 


He 


His 


Tyr 
135 


Cys 


Met 


Gly 


Arg 


Pro 
140 


Leu 


Ala 


Lys 


Leu 


Glu 


Gly 


Glu 


Val 


Ala 


Leu 


Gly 


Ala 


Leu 


Phe 


Asp 


Arg 


Phe 


Pro 


Lys 


Leu 


145 










150 










155 










160 


Ser 


Leu 


Gly 


Phe 


Pro 
165 


Ser 


Asp 


Glu 


Val 


Val 
170 


Trp 


Arg 


Arg 


Ser 


Leu 
175 


Leu 


Leu 


Arg 


Gly 


He 
180 


Asp 


His 


Leu 


Pro 


Val 
185 


Arg 


Pro 


Asn 


Gly 
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<210> 21 
<211> 33 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Synthetic nucleotide DNA duplex 
<400> 21 

taagaattcg gagatctggc ctcagctcta gac 

<210> 22 
<211> 39 
<212> DNA 

<213> Artificial Sequence 



33 



<220> 

<223> Complementary oligo 
<400> 22 

aattgtctag agctgaggcc agatctccga attcttaat 39 

<210> 23 
<211> 528 
<212> DNA 

<213> Micromonospora megalomicea 
<400> 23 

ttgcagcggt tgtcggtggc ggtgcgggag gggcgtcggg tgttgggtgt ggtggtgggt 60 
tcggcggtga atcaggatgg ggcgagtaat gggttggcgg cgccgtcggg ggtggcgcag 120 
cagcgggtga 'ttcggcgggc gtggggtcgt gcgggtgtgt cgggtgggga tgtgggtgtg 180 
gtggaggcgc atgggacggg gacgcggttg ggggatccgg tggagttggg ggcgttgttg 240 
gggacgtatg gggtgggtcg gggtggggtg ggtccggtgg tggtgggttc ggtgaaggcg 300 
aatgtgggtc atgtgcaggc ggcggcgggt gtggtgggtg tgatcaaggt ggtgttgggg 360 
ttgggtcggg ggttggtggg tccgatggtg tgtcggggtg ggttgtcggg gttggtggat 420 
tggtcgtcgg gtgggttggt ggtggcggat ggggtgcggg ggtggccggt gggtgtggat 480 
ggggtgcgtc ggggtggggt gtcggcgttt ggggtgtcgg ggacgaat 528 

<210> 24 
<211> 528 
<212> DNA 

<213> Micromonospora megalomicea 
<400> 24 

ctgcagcggt tgtcggtggc ggtgcgggag gggcgtcggg tgttgggtgt ggtggtgggt 60 

tcggcggtga atcaggatgg ggcgagtaat gggttggcgg cgccgtcggg ggtggcgcag 120 

cagcgggtga ttcggcgggc gtggggtcgt gcgggtgtgt cgggtgggga tgtgggtgtg 180 

gtggaggcgc atgggacggg gacgcggttg ggggatccgg tggagttggg ggcgttgttg 240 

gggacgtatg gggtgggtcg gggtggggtg ggtccggtgg tggtgggttc ggtgaaggcg 300 

aatgtgggtc atgtgcaggc ggcggcgggt gtggtgggtg tgatcaaggt ggtgttgggg 360 

ttgggtcggg ggttggtggg tccgatggtg tgtcggggtg ggttgtcggg gttggtggat 420 

tggtcgtcgg gtgggttggt ggtggcggat ggggtgcggg ggtggccggt gggtgtggat 480 

ggggtgcgtc ggggtggggt gtcggcgttt ggggtgtcgg ggacgaat 528 

<210> 25 
<211> .528 
<212> DNA 

<213> Micromonospora megalomicea 
<220> 
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<221> misc_f eature 
<222> (1) . . . (528) 

<223> Sequence with codon changes as described in the 

specification at page 99, line 22 thru 101, line 23 

<400> 25 

ctgcagcgcc tctccgtcgc cgtccgcgag ggccgccgag tcctcggcgt cgtcgtcggc 60 

tcggccgtca accaagacgg cgcgtcaaac ggcctcgccg cgccctccgg cgtcgcccag 120 

cagcgcgtca tacgccgcgc gtggggacgc gccggagtat cgggcggcga cgtcggagtc 180 

gtcgaggccc acggcaccgg cacccgcctc ggggatcccg tcgagctggg cgccctcctg 240 

ggcacgtacg gcgtcggccg cggcggcgtc ggcccggtcg tcgtcggcag cgtcaaggcc 300 

aacgtcggcc acgtccaggc cgcggccggc gtcgtcgggg tcatcaaggt cgtcctcggc 360 

ctcggccgcg ggctggtcgg cccgatggtc tgccgcggcg gcctcagcgg cctcgtcgac 4 20 

tggtcgtccg gcggcctggt cgtcgcggac ggggtccgcg gctggccggt cggcgtcgac 4 80 

ggcgtccgcc ggggcggcgt ctcggcgttc ggcgtcagcg ggacgaat 528 

<210> 26 
<211> 291 
<212> DNA 

<213> Micromonospora megalomicea 
<400> 26 

ggtggagtgt gatgcggtgg tgtcgtcggt ggtggggttt tcggtgttgg gggtgttgga 60 
gggtcggtcg ggtgcgccgt cgttggatcg ggtggatgtg gtgcagccgg tgttgttcgt 120 
ggtgatggtg tcgttggcgc ggttgtggcg gtggtgtggg gttgtgcctg cggcggtggt 180 
gggtcattcg cagggggaga tcgcggcggc ggtggtggcg ggggtgttgt cggtgggtga 24 0 

tggtgcgcgg gtggtggcgt tgcgggcgcg ggcgttgcgg gcgttggccg g 291 

<210> 27 
<211> 291 
<212> DNA 

<213> Micromonospora megalomicea 



<400> 27 

ggtggagtgt gatgcggtgg tgtcgtcggt ggtggggttt tcggtgttgg gggtgttgga 60 

gggtcggtcg ggtgcgccgt cgttggatcg ggtggatgtg gtgcagccgg tgttgttcgt 120 

ggtgatggtg tcgttggcgc ggttgtggcg gtggtgtggg gttgtgcctg cggcggtggt 180 

gggtcattcg cagggggaga tcgcggcggc ggtggtggcg ggggtgttgt cggtgggtga 240 

tggtgcgcgg gtggtggcgt tgcgggcgcg ggcgttgcgg gcgttggccg g 291 



<210> 28 
<211> 291 
<212> DNA 

<213> Micromonospora megalomicea 
<220> 

<221> misc_f eature 
<222> (1)...(291) 

<223> Sequence with codon changes as described in the 

specification at page 99, line 22 thru page 101, line 23 



<400> 28 

cgtggagtgc gatgcggtcg tgtcgagcgt cgtcggcttc agcgtgctgg gcgtcctgga 60 

gggccgcagc ggcgccccga gcctggaccg cgtcgacgtg gtccagccgg tcctgttcgt 120 

ggtcatggtc agcctggccc gcctgtggcg ctggtgcggc gtggtcccgg ccgccgtggt 180 

cggccacagc cagggcgaga tcgccgccgc ggtcgtggcc ggcgtcctga gcgtcggcga 240 

cggcgcccgc gtcgtggccc tgcgcgcccg cgccctgcgc gccctggccg g 291 



<210> 29 
<211> 24 
<212> DNA 
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<213> Artificial Sequence 
<220> 

<223> PCR primer 
<400> 29 

gaacaactcc tgtctgcggc cgcg 24 

<210> 30 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> PCR primer 
<400> 30 

cggaattctc tagagtcacg tctccaaccg cttgtcgagg 40 

<210> 31 
<211> 51 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> PCR primer 
<400> 31 

tctagactta attaaggagg acacatatga gcgagagcag cggcatgacc g 51 

<210> 32 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> PCR primer 
<400> 32 

aacgcctccc aggagatctc cagca 25 

<210> 33 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Oligo 
<400> 33 

aattcatagc ctaggt 16 

<210> 34 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Oligo 
<400> 34 
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